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Abstract 

The  goal  of  this  paper  is  to  explicitly  address  the  question  whether  any  true 
analogy  exists  between  the  central  nervous  system  and  the  world  wide  web. 
By  exploiting  intuitive  ideas  of  different  fields  ranging  from  neurobiology  to 
sociology  we  investigate  and  present  some  properties  of  different  evolving  net¬ 
works,  whose  dynamics  is  governed  by  the  Hebbian  learning  rule.  We  intended 
to  merge  two  different  fields:  the  tools  have  been  provided  by  the  graph  the¬ 
ory  of  evolving  networks,  while  the  properties  of  the  Hebbian  based  learning 
rule  is  provided  by  computational  neuroscience  (CNS)  models.  We  studied  the 
emerging  connection  structure  of  purely  feed-forward  and  feedback  networks. 
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0.1  Introduction 


The  last  few  years  have  witnessed  an  increasing  demand  on  describing  systems 
with  complex  structure  and  topology.  These  structures  emerge  in  many  different 
fields,  ranging  from  social  sciences  to  genetics.  Probably  the  most  complex  net¬ 
work  is  inside  us:  the  most  exciting  properties  of  our  brain  have  a  lot  to  do  with 
the  special  connection  system  among  its  units.  Although  our  knowledge  on  these 
building  blocks  is  increasing  we  are  still  far  from  a  complete  understanding  of  the 
structure  [5] .  At  the  same  time,  many  intriguing  questions  emerge  regarding  the 
World  Wide  Web  (WWW),  which  can  qualify  as  a  common  container  of  human 
knowledge.  This  feature  resulted  in  the  intuitive  idea  to  highlight  the  similarity 
between  these  descent  phenomena.  From  one  hand  it  has  been  suggested  to 
use  the  mutual  activity  correlation  in  modeling  organizational  learning  [4],  while 
on  the  other  hand  similar  structural  characteristics  of  the  web  and  the  single 
completely  described  nervous  system,  the  C.  elegans  has  been  reported[6].  In 
this  Technical  Report  we  investigate  the  question  whether  an  evolving  network 
governed  by  the  Hebbian  rule  has  the  same  or  similar  properties  as  found  by 
studying  the  web  or  the  social  networks.  Specifically,  we  studied  the  emergent 
small-world  properties, like  clusters  and  hubs  which  are  presumably  responsible 
for  the  robustness  of  many  real  networks  against  different  perturbations.  Their 
presence  may  also  explain  the  network  fragility  against  erasing  central  nodes, 
which  have  been  also  observed  in  many  complex  and  important  networks.  The 
question  is  of  central  importance:  in  seeking  the  answer  we  hope  to  find  in  the 
near  future  some  explanation  on  the  fascinating  robustness-plasticity  dualism  of 
the  brain  and  the  fast  association  memory  system.  Insofar  as  we  do  not  intend 
to  do  the  tedious  job  of  building  real  models  according  to  the  computational 
neuroscience  (CNS)  standards,  we  have  built  different  networks  of  units  with 
minimal  complexity.  Inputs  with  different  information  content  have  been  pro¬ 
vided  to  the  system.  The  system  has  two-fold  response:  providing  the  input  it 
yields  some  output  and  modifies  its  connection.  This  self-organizing  tuning  was 
governed  by  different  modifications  of  the  Hebbian  learning  rule.  One  group  of 
the  networks  consisted  of  purely  feed-forward  connections  (that  is  units  were 
organized  in  layers),  while  the  other  group  have  had  feedback  connections  as 
well. 

By  watching  the  evolving  networks  we  have  tried  to  define  and  localize  struc¬ 
tures  similar  to  those  that  are  used  in  the  terminology  describing  the  internet. 
This  Technical  Report  is  primarily  meant  to  summarize  the  simulation  results 
at  different  parameter  settings.  The  Report’s  construction  is  as  follows:  first 
we  give  a  basic  description  of  the  most  fundamental  conceptions  we  used.  To 
make  the  understanding  easier  we  also  provide  a  detailed  list  of  the  used  nota¬ 
tions.  The  next  sections  are  for  detailing  the  different  models  and  presenting 
the  governing  equations.  At  last,  all  simulation  results  are  presented  with  no 
analysis.  A  descriptive  analysis  would  require  further  simulations.  The  results 
may  hopefully  give  a  qualitative  picture  of  the  Hebbian  network’s  (’HebbWeb’ 
in  short)  nature. 
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0.2  Definitions 

0.2.1  Concepts 

HebbWeb  Set  of  neurons  with  a  simple  Hebbian  learning  rule  gets  some  ran¬ 
dom  external  excitation.  The  connections  among  the  neurons  (or  units  or 
nodes)  can  be  characterized  by  a  weighted,  directed  graph.  We  address 
the  question,  whether  the  evolving  network  resembles  the  WEB’s  struc¬ 
ture  in  its  connection  complexity.  We  are  especially  interested  in  emerging 
clusters,  since  they  are  supposed  to  contribute  to  the  appearance  of  the 
famous  small  world  property  of  the  WEB[1]. 

Distance  the  path  with  minimum  cost  between  two  nodes.  The  cost  of  a  path 
is  the  sum  of  link  costs  on  the  path.  The  link  cost  is  1/w  where  w  is  the 
weight  between  the  nodes  of  the  link. 

Cluster  Set  of  nodes  which  are  strongly  connected  to  each  other  and  have  fewer 
links  to  the  other  nodes. 

Clustering  coefficient  is  defined  for  each  neuron  in  the  following  way:  the 
ratio  of  the  existing  links  and  the  possible  number  of  links  (n  *  (n  —  1) 
between  neurons  linked  by  the  actual  neuron. 

Hub  and  Authority  values  calculated  iteratively  [3]  for  each  neuron.  The 
weight  matrix  is  treated  as  a  transition  matrix  after  a  structure  holding 
normalization.  One  iteration  is  the  following: 

aut  =  P*  *  hub 

hub  =  P  *  aut 

where  P  is  the  normalized  weight  matrix,  aut  is  authority  vector  and  hub 
is  the  hub  vector  . 

Feedback  layer  The  neurons  and  the  directed  connections  among  them 

Feedforward  layer  The  connection  layer  between  the  input  and  the  neurons. 
In  pure  feedback  simulations  the  feedforward  connection  matrix  is  equals 
to  the  Identity  matrix. 

0.2.2  Notations 

W:  The  weight  matrix  of  the  feedback  layer 
W ext:  The  weight  matrix  of  the  feedforward  layer 
a:  The  inner  activity  of  the  feedback  layer 
af:  The  firing  activity  of  the  feedback  layer 
N:  The  number  of  neurons  in  the  net. 

r:  The  percentage  of  outer  excitation  relative  to  the  number  of  neurons  ( N ) 
a:  The  learning  rate  of  matrix  W 
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/ 3 :  The  learning  rate  of  matrix  Wext 

7:  The  decay  factor  of  the  inner  activity,  only  used  in  the  first  model 

cycleJength:  It  defines  the  simulation  length  (number  of  update  steps). 

K(.)  This  kernel  function  describes  the  temporal  correlation  between  nodes 
which  have  been  active  at  different  times. 

L:  The  length  of  the  kernel  function  for  negative  time  difference. 

ta :  The  ratio  of  the  amplitude  of  weakening  and  strengthening  in  the  kernel 
function. 

8:  Threshold  parameter  in  the  firing  rule  -  its  meaning  is  case-dependent 

finite  Jnput:  It  is  set  to  0,  if  all  neurons  may  get  excited,  that  is  the  sum  of  the 
external  excitation  is  not  bounded.  Its  value  is  1,  if  this  sum  is  bounded. 

x(i)ext:  The  external  input  at  time  i. 

0.3  Experiments 

We  studied  networks  with  fix  number  of  nodes,  but  changing  connections  strength. 
This  change  is  induced  by  random  inputs  of  the  external  world  and  principally 
governed  by  local,  Hebbian  like  interactions  among  the  nodes  or  neurons.  Our 
numerical  studies  aim  at  revealing  the  possible  parameter  dependence  of  the 
cluster  creation  and  authority-hub  properties  and  are  not  meant  to  allow  for 
direct  comparison  with  sophisticated  CNS  models.  Specifically,  we  have  studied 
the  evolution  of  Hebbian  networks  with  and  without  feedback  connections  as  a 
function  of  some  fundamental  parameters,  such  as  tuning  (learning)  parameters, 
input  dynamics  and  intrinsic  network  properties  (inhibition,  bias,  and  so  on). 

0.3.1  Network  structure  and  dynamics 

Model  structure 

Our  models  consist  of  two  layers  of  simplified  model  neurons.  In  some  cases,  we 
used  neurons,  whose  output  is  simply  a  linear  function  of  their  activity,  instead 
of  using  the  more  sophisticated  integrate-and-fire  (IAF)  type  neurons.  In  other 
cases  we  applied  neurons  with  IAF-like  behavior,  but  made  an  effort  to  eliminate 
as  much  parameters  as  we  could.  In  addition  to  changing  the  neurons’  internal 
dynamics,  different  architectures  have  been  investigated.  In  the  first  model  type 
one  layer  ensures  the  connection  with  the  external  world:  its  neurons  can  be 
excited  by  external  random  inputs.  The  other  layer’s  units  are  driven  by  the 
input  neurons’  excitation  and  the  connections  among  the  units  are  modified 
according  to  the  activity  of  the  sender  and  receiver  units  (see  Figure  1).  In 
addition  to  the  pure  feedforward  model,  the  second  model  type  also  includes 
recurrent  connections  among  the  units  of  the  internal  layer  (see  Figure  2). 
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System  output 


Internal  neurons 


Input  layer 


External  random  input 


Figure  1:  Pure  feedforward  network 


Figure  2:  Pure  feedback  network 

In  all  cases  studied,  the  feedforward  connection  matrix  is  equal  to  the  Identity 
matrix. 


Input  generation 

We  studied  the  system’s  behavior  as  a  function  of  the  inputs’  spatio-temporal 
structure.  In  most  cases  the  external  input  xext  consists  of  zeros  and  N  *  r 
number  of  ones  at  random  (binary  input),  where  r  defines  the  induction  ratio. 
For  the  special  case  of  sine  wave  inputs,  neurons  received  a  half  sine  wave  input 
in  each  step  (analogue  input). 

The  excitation  of  the  (internal)  neurons  takes  place  in  two  different  modes. 
In  the  first  case  ( finite-input  =  1)  the  input  to  each  neuron  is  weighted  ac¬ 
cording  to  its  connection  strength.  In  other  words,  the  sum  of  the  input  was 
kept  unchanged  after  the  transmission  through  the  weighted  connections.  In  the 
other  case  ( finite -input  =  0)  the  excitation  of  one  neuron  is  simply  the  scalar 
product  of  the  input  vector  and  the  weight  vector  of  its  incoming  connections 
(the  ith  row  of  matrix  Wext  describes  the  ith  neuron’s  connection  strength) . 

Model’s  dynamics 

The  network  has  two  responses  to  the  external  excitation.  It  not  only  generates 
output  (which  is  a  function  of  the  activity  measured  in  the  internal  layer),  but 
also  modifies  the  connections  between  and  within  the  layers.  We  define  two 
different  output  generation  mode:  in  the  first  mode  (it  can  be  called  analogue 
firing),  the  neurons  broadcast  a  sign,  which  is  a  linear  function  of  their  input.  In 
the  second  mode  (spiking  or  binary  firing),  they  produce  binary  signs,  which  are 
non-linear  function  of  their  input.  The  applied  non-linearities  will  be  detailed 
later  (See  subsection  0.3.2).  The  general  update  rule  for  the  inner  activity  is  as 
follows: 

a(t)  =  a (t  -  1)  +  Wa(i  -  l)f  +  W extx(t)ext  (1) 
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When  analogue  input  was  applied  a  decay  factor  7  was  introduced.  The 
modified  update  rule  is  described  in  Eq.  2. 

a(t)  =  (1  -  7)a(f  -  1)  +  7Wa(t  -  1)'  +  W ext*(t)ext  (2) 

where  superscript  stands  for  firing  which  can  be  one  of  the  types  listed  above. 
Using  this  mode  the  resulting  neuron  model  resembles  the  leaky  IAF  models. 
After  firing,  the  neuron’s  inner  activity  is  set  to  0  (reset  activity). 

The  other  response  of  the  system  is  an  incremental  change  in  its  architecture. 
We  applied  different  versions  of  the  so  called  Hebbian  learning  rule  for  governing 
the  tuning  of  each  connection  strength.  The  main  inspiration  behind  this  choice 
that  this  rule  seems  to  be  one  of  the  most  general  and  fundamental  principles 
defining  the  neuronal  plasticity.  Furthermore,  it  is  exclusively  based  on  local 
interaction:  there  is  no  need  to  tune  an  external  parameter  or  define  some  global 
optimization  process  [2]. 

The  update  rule  for  the  feedforward  matrix  is  the  following: 

t 

W (t)  =  W(t  —  1)  +  a  ^(A'(a^(i),  x.ext(t),  t  —  i)  +  K(aj  (f),  x.ext(i),  i  —  t )) 
i= 0 

where  a  is  the  learning  parameter,  K(.)  denotes  the  specific  kernel  function 
defined  by  the  specific  version  of  the  Hebbian  rule.  Its  arguments  are  the  firing 
activity  at  a  given  time  i,  the  actual  input  and  time  difference  t  —  i  or  i  —  t. 
This  kernel  describes  the  correlation  between  nodes  which  have  been  active 
at  different  time.  If  the  information  flow  performed  by  the  net  has  a  specific 
direction,  the  kernel  function  cannot  be  symmetric  over  time.  That  is,  ‘pre’ 
and  ‘post’  active  states  can  be  assigned  to  the  interacting  nodes.  Symmetric, 
linearly  decreasing  and  asymmetric  kernels  have  been  studied. 


i 

«  L 

k  Amplitude 
(K(A)) 

- ► 

HI 

L 

A 

1 

1 

Time  difference 

J  (A) 

Figure  3:  The  symmetric  kernel  function 

The  kernel  can  be  defined  by  the  following  parameters:  the  sum  of  the  length 
of  the  weakening  and  the  strengthening  region  ( L  =  L+  +  L~ )  and  the  ratio  of 
the  amplitudes  (r^  = 

The  update  rule  for  the  feedback  connection  matrix  is  the  following: 

t 

W(f)  =  W(t  -  1)  +a^2(K({af(i)),aJ  (f),  t-i)  +  K((af  {t)),af  ( i),i  - 1))  (3) 

i=0 

where  a  is  the  learning  parameter,  K(.)  denotes  the  specific  kernel  function 
defined  by  the  specific  version  of  the  Hebbian  rule.  Its  arguments  are  the  firing 
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Figure  4:  Kernel  function  with  linear  decrease 

The  kernel  can  be  defined  by  the  following  parameters:  the  length  of  the  weaken¬ 
ing  and  the  strengthening  region  (L)  and  the  ratio  of  the  amplitudes  [ta  =  ^=). 
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Time  difference 
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Figure  5:  The  asymmetric  kernel  function 

This  kernel  can  be  defined  by  the  following  parameters:  the  ratio  of  the  length 
of  the  weakening  and  the  strengthening  region  (tl  =  j=)  and  the  ratio  of  the 
amplitudes  (r a  =  In  our  simulations  L+  was  fixed  to  1  and  L~  was 

changed. 


activity  at  a  given  time  i,  the  recent  firing  activity  and  the  time  difference  t  —  i 
or  i  —  t. 

0.3.2  Default  values  and  model  set  up 

Different  firing  models  have  been  studied: 

1.  Simple  threshold  model:  a  neuron  fires  if  its  inner  activity  is  larger  than 
a  fixed  threshold  ( 9 ) 

2.  Probabilistic  firing  with  threshold:  a  neuron  fires  if  its  inner  activity  is 
larger  than  a  fixed  threshold  (0).  Below  that  the  ith  neuron  fires  with 
probability  a(i)/9 

3.  Probabilistic  firing  with  fixed  average  number  of  firing  neurons:  the  sum 
of  all  inner  activity  is  normalized  to  N  *  0.  The  normalized  activity  of  a 
neuron  is  the  probability  of  its  firing. 
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parameter 

value 

N 

100 

cycle-length 

5000 

a 

0.1 

7 

0.1 

r 

0.1 

9 

0.13 

L 

2 

ta 

2 

Table  1:  Default  values  for  analogue  firing 


Analogue  firing 

Default  values  (see  Table  1) 

During  the  simulations  only  the  feedback  layer’s  connections  were  changed 
and  Wext  remained  unchanged  and  equal  to  Identity,  i.e.  /3  =  0  and  Werf  =  I 

The  following  list  summarizes  the  experiments  with  analogue  output. 

1.  kernel  size  and  amplitude  ratio  (20020124T105349  -  20020124T130526), 
asymmetric  kernel,  simple  threshold  firing  model,  L  =  [1,2,4],  774  =  1  : 

.5  :  4,  9  =  [.01  :  .04  :  .21] 

2.  larger  firing  threshold  (20020124T134702  -  20020 124T144448),  asym¬ 
metric  kernel,  simple  threshold  firing  model,  L  =  1,  va  =  [3,3.2],  9  = 

[.19  :  .02  :  .41] 

3.  Study  of  linearly  decreasing  kernel  (20020205T172012  -  20020206T113601), 
simple  threshold  firing  model,  L  =  [2, 4,  8],  ca  =  [2, 4, 8],  9  =  [.09,  .13,  .27] 

4.  moving  sine  valued  input  (20020207T193755  -  20020207T232358),  lin¬ 
early  decreasing  kernel,  probabilistic  firing  with  threshold,  L  =  2,  ca  =  4, 
r  =  .1  :  .2  :  .9,  9  =  [.5  :  .5  :  4] 

Spiking  type  firing 

Changing  from  analogue  to  binary  output  is  meaningful  for  several  reasons. 

First,  it  is  much  more  plausible  from  the  biologist  aspect.  Second,  it  is  more 
feasible  from  the  engineers’  point  of  view.  As  the  introduction  of  this  non¬ 
linearity  seems  to  complicate  the  picture  we  intended  to  further  simplify  the 
model  by  eliminating  some  parameters.  During  the  simulations  the  7  decay 
factor  (see  equation  1  proved  to  be  dispensable.  Its  normalization  effect  has 
been  substituted  by  proper  setting  of  the  remaining  parameters. 

The  idea  behind  using  the  feedforward  layer  is  the  preferential  attachment. 

This  layer  should  strengthen  its  weights  to  neurons  which  fire  after  receiving 
external  excitation. 

The  a  and  /3  parameters  is  smaller  than  at  the  analogue  firing  case  because 
the  update  is  the  sum  of  0s  and  Is  and  is  independent  from  the  actual  values 
of  inner  activity.  The  system  was  sensitive  to  the  setting  of  these  parameters  in 
that  if  they  were  too  large  then  the  weight  distribution  could  not  converge  but 
started  to  behave  in  a  chaotic  manner.  For  default  values  see  Table  2. 
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parameter 

value 

TV 

100 

cycle-length 

5000 

a 

0.0001 

P 

0.001 

r 

0.1 

e 

0.5 

L 

2 

r  A 

2 

Table  2:  Default  values  for  spike  like  firing 


Figure  6:  1  synchronous  input 

activeJength  =  10,  inactive-length  =  10,  TV  =  20,  r  =  0.3.  The  first  input  is 
without  any  noise.  In  the  second  one  80  percent  of  input  is  missing.  On  the 
third  one  20  percent  of  neurons  get  external  noise. 


We  studied  the  feedforward  and  feedback  layers  separately. 

Some  special  non-random  input  with  given  spatio-temporal  structures  have 
been  also  studied  with  the  symmetric  kernel.  These  were: 

1.  Moving  bar:  TV  *  r  neuron  received  unity  input,  after  10  steps  the  bar  was 
shifted  with  one  neuron  and  so  on. 

2.  Moving  bar  with  noise:  random  noise,  i.e.  0-1  flip  was  added  to  the 
previous  input  with  a  given  percentage. 

3.  Synchronous  input:  TV  *  r  neurons  got  input  for  a  given  number  of  steps 
(active -length)  after  that  no  input  was  given  to  any  neurons  for  a  given 
interval  (inactive-length) .  See  Fig.  6.  1. 

4.  Synchronous  input  with  noise:  There  is  two  types  of  noises:  the  input 
is  not  complete;  neurons  not  belonging  to  the  pattern  also  get  external 
excitation.  See  Fig.  6.2-3. 

5.  2  synchronous  inputs  (with  noise):  two  inputs  were  mixed,  they  were 
synchronous  with  the  time  window,  but  had  different  period  lengths.  See 
Fig.  7. 

6.  1  synchronous  input  and  1  out  of  synchronous  work.  See  Fig.  8. 


0.4  Measured  values 

The  following  type  of  graphs  were  generated: 

1.  Histogram  of  the  minimum  path  length  of  the  hebbian  network. 

2.  Norm  of  weight  matrix  vs.  time 


16 


Figure  7:  2  synchronous  inputs 

active-length  =  10(6),  inactive -length  =  14(9),  N  =  20.  The  first  input  has  2 
separate  synchronous  inputs  with  r  =  0.3.  The  second  input  has  2  synchronous 
inputs  with  1  common  neuron  and  r  =  0.4. 


Figure  8:  1  synchronous  and  1  out  of  synchronous  work  input 

active-length  =  10(15),  inactive-length  =  14(20),  N  =  20.  The  first  input  has 
2  separate  inputs  with  r  =  0.3.  The  second  input  has  2  inputs  with  1  common 
neuron  and  r  =  0.4. 

3.  The  resulting  weight  matrix 

4.  Histogram  of  the  connection  strengths.  (Zero  strength  is  not  shown) 

5.  Hub-Authority  values  and  ratios  for  neurons:  upper  graph:  line  is  the 
authority  value  of  neuron,  dotted  line  is  the  hub  value  of  neuron;  mid¬ 
dle  graph:  the  authority/hub  ratio  if  hub  value  y^  0;  lower  graph:  the 
hub/authority  ratio  if  authority  yf  0. 

6.  Clustering  coefficients  of  hebbian  (-0-)  and  random  (-x-)network 

7.  Global  and  local  connectivity  length  of  hebbian  (-,  -o-)  and  random  (., 
.x.)network 

8.  Distance  plot  of  hebbian  network:  -:  the  ordered  directed  distances  be¬ 
tween  the  neurons,  x:  the  number  of  different  neurons  belonging  to  at  least 
one  of  the  countered  connections,  normalized  by  the  maximum  distance. 

0.5  Results 

The  value  of  a,  (3  and  7  are  set  so  that  the  system  converges  fast  if  possible  - 
the  values  in  default  value  tables  are  to  show  the  order  of  these  parameters. 

0.5.1  Analogue  firing 

We  have  found  that  the  net  with  this  firing  model  always  evolves  to  a  random 
statistic  net  if  the  input  is  random.  The  typical  random  statistics  of  these  nets 
can  be  seen  in  figure  9. 
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Figure  9:  Random  statistics 

For  figure  details  see  section  0.4. The  parameter  values  are  summarized  in  Table 
1,  firing  model  is  1,  time  window  is  1. 
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Large  initial  activity 


In  this  case  the  initial  activity  had  relatively  large  standard  deviation  (see  Figure 
10.  The  net  had  a  short  transient  clustered  but  not  efficient  state,  as  it  can  be 
seen  in  figure  11.  After  this  state  the  net  had  evolved  to  random  statistics 
net  (see  Figures  12,  13).  The  default  parameter  values  of  Table  1  have  been 
applied,  the  firing  model  was  the  simple  threshold  model,  the  kernel  function 
was  asymmetric. 
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Figure  11:  Large  initial  activity:  transient  state  statistics 

For  figure  details  see  section  0.4.  The  relatively  big  initial  activity  have  caused 
the  appearance  of  a  transient,  clustered,  but  not  efficient  state  after  40  steps. 
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Figure  12:  Large  initial  activity:  statistics  of  almost  random  state 

For  figure  details  see  section  0.4.  The  transient  clustered  state  has  almost  com¬ 
pletely  disappeared  after  200  steps. 
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Figure  13:  Large  initial  activity:  random  statistics  after  500  steps 

For  figure  details  see  section  0.4.  After  500  steps  the  net  had  random  statistics. 
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Kernel  function  length  ( L  parameter) 

We  have  found  that  the  larger  the  kernel  time  window  the  faster  the  system 
evolves  to  random  net  (see  Figures  14-16).  Also  when  L  is  increased  then  the 
learning  factor  of  the  W  matrix  should  be  decreased  to  prevent  oscillation  be¬ 
cause  the  update  values  increases  as  L  increases.  We  used  the  default  parameter 
values  of  Table  1,  the  firing  model  was  the  simple  threshold  model,  the  kernel 
function  was  linearly  decreasing. 
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Ratio  of  strengthening  and  weakening  {r  a  parameter) 

Four  different  behaviors  emerged  depending  on  the  value  of  ratio  r^: 

•  inhibition  strength  <<  1  — ■»  exponential  growth  of  weights  (see  Figure  17 

•  inhibition  strength  ss  1  — »  linear  growth  of  weights(see  Figure  18) 

•  inhibition  strength  >>  1  i.e.  2,4  — >  stable  system  (see  Figure  19) 

•  inhibition  strength  >>10  i.e.  50,100..  — >  stable  system  but  very  few  non 
zero  weights,  changing  step  by  step  (see  Figure  20) 

We  used  the  default  parameter  values  of  Table  1  except  that  the  firing  threshold 
was  set  to  0.09,  the  firing  model  was  the  simple  threshold  model,  the  kernel 
function  was  linearly  decreasing. 
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Figure  17:  Analogue:  ta  =  01 

For  figure  details  see  section  0.4.  The  input  ratio  (r)  was  set  to  0.3.  Because  of 
the  exponential  growth  (2nd  plot)  cycleJength  was  only  400. 


Figure  18:  Analogue:  i'a 


figure  details  see  section  0.4, 


Figure  20:  Analogue:  r  a 
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figure  details  see  section  0.4, 


Firing  threshold  with  simple  threshold  firing  model  ( 6  parameter) 

We  have  found  3  different  convergence  types  depending  on  the  threshold  value. 
After  eliminating  7  the  regions  disappeared  and  the  2nd  behavior  was  observed 
everywhere  in  the  whole  parameter  region. 

1 .  9  <  /3  *  avg(xext)  In  this  case  if  a  neuron  got  excited  externally  it  can 
immediately  fire.  The  norm  of  W  is  descending  from  higher  value  to  a 
stable  value  (see  Figure  21). 

2.  (3  *  avg{xext  <  6  <  2  *  [3  *  avg(xext  In  this  case  a  neuron  needs  more 
external  excitation  or  more  feedback  activity  to  be  able  to  fire.  The  norm 
of  W  ascending  from  a  lower  value  to  a  stable  value  (see  Figure  22). 

3.  9  >  2*  f3*  avg(xext.  Because  of  the  exponential  unlearning  only  a  few  will 
be  able  to  fire.  These  neurons  takes  all  the  activity  from  other  neurons 
(see  Figure  23) 

We  used  the  default  parameter  values  of  Table  1,  the  bring  model  was  the 
simple  threshold  model,  the  kernel  function  was  linearly  decreasing. 
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Figure  21:  Analogue:  9  =  0.' 
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figure  details  see  section  0.4, 
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Figure  23:  Analogue:  9  =  0. 


figure  details  see  section  0.4, 


firing  models 


The  probabilistic  firing  model  with  threshold  produced  results  like  the  simple 
threshold  firing  model  with  6  =  0.09.  The  reason  for  it  could  be  that  in  this 
case  all  neuron  can  fire  with  a  nonzero  probability  (except  those  have  exactly  0 
inner  activity),  so  relatively  many  neurons  may  fire  (see  Figures  24  -  26).  We 
kept  the  simple  threshold  firing  model  because  of  its  simplicity.  We  used  the 
default  parameter  values  of  Table  1,  the  kernel  function  was  linearly  decreasing. 
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Figure  26:  Analogue:  probabilistic  firing  model  with  threshold,  6  =  0.27 

For  figure  details  see  section  0.4. 
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Different  kernels 

There  was  no  significant  difference  between  the  results  with  different  kernels. 
See  figures  27  -  37.  We  used  the  default  parameter  values  of  Table  1,  the  firing 
model  is  the  simple  threshold  one. 


41 


Figure  27:  Analogue:  asymmetric 

For  figure  details  see  section  0.4. 


window,  0  =  0.09 


Figure  28:  Analogue:  asymmetric 

For  figure  details  see  section  0.4. 
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Figure  29:  Analogue:  asymmetric  time  window,  9  =  0.27 

For  figure  details  see  section  0.4. 
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Figure  32:  Analogue:  linearly  deer 

For  figure  details  see  section  0.4. 
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Figure  33:  Analogue:  symmetric  ti: 

For  figure  details  see  section  0.4. 
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Figure  34:  Analogue:  symmetric  ti 

For  figure  details  see  section  0.4. 
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Figure  35:  Analogue:  symmetric  time  window,  6  =  0.27 

For  figure  details  see  section  0.4. 
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Figure  36:  Analogue:  symmetric  time  window,  8  =  0.01,  r  =  0.8 

For  figure  details  see  section  0.4. In  the  spike  like  firing  model  the  system  tried  to 
learn  some  synchronous  behavior,  here  this  property  is  not  really  characteristic. 


51 


Figure  37:  Analogue:  symmetric  time  window,  8  =  0.09,  r  =  0.8 

For  figure  details  see  section  0.4. In  the  spike  like  firing  model  the  system  tried  to 

learn  some  synchronous  behavior,  here  this  property  is  not  really  characteristic. 


moving  sine  valued  input 

The  system  learned  a  moving  wave  (see  Figure  38).  Default  parameter  values 
have  been  used,  except  for  6  =  0.5,  the  firing  model  is  the  probabilistic  one  with 
threshold,  the  kernel  is  linearly  decreasing. 

0.5.2  Spike  like  firing 

With  the  symmetric  kernel  the  weight  matrix  will  be  symmetric  this  means  that 
the  connections  between  the  nodes  are  undirected. 

Feedback  layer 

At  first  we  investigated  the  symmetric  time  window  and  random  input  and 
probabilistic  firing  with  fixed  average  number  of  firing  neurons.  We  have  found 
that  the  most  important  parameters  are  the  ratio  of  external  excitation  (r) 
and  the  ratio  of  the  average  number  of  firing  neurons  ( 9 ).  We  have  found  two 
different  regions: 

•  few  neurons  get  external  excitation  or  few  neurons  able  to  fire: 

the  system  evolves  to  random  network  (see  Figures  39  -  41). 

•  more  than  half  of  neurons  get  external  excitation  and  more  than 
half  of  neurons  able  to  fire:  the  system  learns  some  synchronous  se¬ 
quence  with  fast  change  due  to  random  input  (see  Figure  42). 

We  used  the  default  parameter  values  of  Table  2  except  for  9  and  r. 
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Figure  38:  Analogue:  moving  sine  v 

For  figure  details  see  section  0.4. 
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Figure  39:  Spike,  feedback  layer,  symmetric  kernel:  random  input,  few 
neurons  get  external  excitation  and  few  neurons  able  to  fire 

For  figure  details  see  section  0.4.0  =  0.3,  r  =  0.3. 


Figure  40:  Spike,  feedback  layer,  symmetric  kernel:  random  input,  few 
neurons  get  external  excitation  but  more  neurons  able  to  fire 

For  figure  details  see  section  0.4.  0  =  0.7,  r  =  0.3. 
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Figure  41:  Spike,  feedback  layer,  symmetric  kernel:  random  input 
more  neurons  get  external  excitation  but  few  neurons  able  to  fire 

For  figure  details  see  section  0.4.0  =  0.3,  r  =  0.7. 


Figure  42:  Spike,  feedback  layer,  symmetric  kernel:  random  input 
more  neurons  get  external  excitation  and  more  neurons  able  to  fire 

For  figure  details  see  section  0.4.0  =  0.7,  r  =  0.7. 


Next  we  investigated  the  symmetric  kernel  with  synchronous  inputs.  We 
have  found  that  if  the  input  is  synchronous  with  L  and  ta  then  the  network  is 
able  to  learn  which  neurons  works  together.  The  system  has  good  associative 
property  (Figures  45,  46)  but  the  additive  noise  filtering  is  not  so  strong  (Figures 
47,  48).  If  the  input  is  not  synchronous  then  the  system  sometimes  finds  the 
synchronous  work  otherwise  seems  to  be  random  (Figure  49)  -  it’s  like  out-of- 
synch  systems.  With  the  1st  firing  model  learns  the  synchronous  input  (Figure 
44)  but  with  the  2nd  one  it  learns  something  else  (Figure  43).  We  used  the 
default  parameter  values  of  Table  2. 
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Figure  47:  Spike,  feedback  layer:  synchronous  input,  20  percent  of 
neurons  get  external  noise 

For  figure  details  see  section  0.4.  Simple  threshold  firing  model.  The  system 
learns  the  complete  input  without  noise. 
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Figure  49:  Spike,  feedback  layer:  out  of  synchronous  work  input  with¬ 
out  noise 

For  figure  details  see  section  0.4.  Simple  threshold  firing  model.  The  system 
can’t  learn  the  complete  input. 
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Next  we  studied  the  symmetric  kernel  and  simple  threshold  firing  model  with 
2  synchronous  inputs.  We  have  found  that  if  the  inputs  are  synchronous  with 
L  and  r a  then  the  network  is  able  to  learn  which  neurons  works  together  -  for 
the  inputs  independently  (Figure  50  ).  If  the  inputs  share  some  neurons  then 
the  input  for  the  common  neurons  goes  out  from  synchronous  work,  so  the  net 
won’t  learn  the  work  of  that  neuron  (Figure  56  ).  If  one  of  the  inputs  is  out  of 
synchronous  work  then  the  net  will  learn  just  the  synchronous  one  (Figures  58, 
57).  The  noise  filtering  property  is  a  little  bit  weaker  than  in  the  1  synchronous 
input  case  (Figures  53  -  55).  The  input  association  strength  remained  the  same 
(Figures  51,  52  ).  We  used  the  default  parameter  values  of  Table  2. 
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Figure  50:  Spike,  feedback  layer:  2  synchronous  inputs  without  noise 

For  figure  details  see  section  0.4.  The  system  learns  both  complete  input. 
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Figure  51:  Spike,  feedback  layer:  2  synchronous  inputs,  80  percent  of 
input  is  missing 

For  figure  details  see  section  0.4.  The  system  learns  both  complete  input. 


Figure  52:  Spike,  feedback  layer:  2  synchronous  inputs,  90  percent  of 
input  is  missing 

For  figure  details  see  section  0.4.  The  system  can’t  learn  neither  complete  input. 
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Figure  53:  Spike,  feedback  layer:  2  synchronous  inputs,  10  percent  of 
neurons  get  external  noise 

For  figure  details  see  section  0.4.  The  system  learns  both  complete  input  without 
noise. 
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Figure  56:  Spike,  feedback  layer:  2  synchronous  inputs  without  noise, 
with  1  common  neuron 

For  figure  details  see  section  0.4.  The  system  learns  both  complete  input  without 
the  common  neuron.  Here  one  input  is  4  neuron  wide,  but  system  learns  only  3 
neurons  per  input 
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Figure  57:  Spike,  feedback  layer:  1  synchronous  input  and  1  out  of 
synchronous  work  without  noise,  with  1  common  neuron 

For  figure  details  see  section  0.4.  The  system  learns  the  synchronous  input 
without  the  common  neuron.  Here  one  input  is  4  neuron  wide,  but  system 
learns  only  3. 
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Figure  58:  Spike,  feedback  layer:  1  synchronous  input  and  separate  1 
out  of  synchronous  work  without  noise 

For  figure  details  see  section  0.4.  The  system  learns  the  synchronous  input. 
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Next  we  investigated  the  symmetric  kernel  with  simple  threshold  firing  model 
for  a  special  synchronous  input:  the  moving  bar  input.  The  ith  neuron  get 
external  excitation  in  the  [j,j  +  barlength]  time  interval  if  jmodN  =  i.  We 
have  found  the  following  different  behaviors: 

•  On  a  relatively  large  range  of  time  window  sizes  the  system  able  to  learn 
the  time  sequence  of  the  bars.  If  L  ss  barlength  the  system  is  stable 
(Figures  60,  61  ).  If  L  «  2  *  ( barlength )  the  system  is  not  stable  but 
learns  the  time  sequence  (Figures  62,  63  ). 

•  The  too  short  time  window  size,  i.e.  L  <  (barlength) / 2  prevents  the 
system  to  learn  the  sequence  (Figure  59  ). 

•  The  too  large  time  window,  i.e.  L  ss  2  *  ( N  —  bar)  length),  causes  some 
strange  effects  (Figures  64,  61  ). 

•  If  L  <  N  —  bariength  but  L  barlength  then  the  system  learns  the  time 
sequence  and  the  together  excited  neurons  too  (Figure  66). 

We  used  the  default  parameter  values  of  Table  2  except  for  N  =  20  and 
r  =  0.3  (bar  length  is  N  *  r  =  6). 
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Figure  59:  Spike,  feedback  layer:  moving  bar  input  with  L  =  2 

For  figure  details  see  section  0.4.  The  system  can’t  learn  the  time  sequence 
because  of  too  short  time  window. 
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Figure  60:  Spike,  feedback  layer:  moving  bar  input  with  L  =  4 

For  figure  details  see  section  0.4.  The  system  learns  the  time  sequence. 


Figure  61:  Spike,  feedback  layer:  moving  bar  input  with  L  —  6 

For  figure  details  see  section  0.4.  The  system  learns  the  time  sequence. 


Figure  63:  Spike,  feedback  layer:  moving  bar  input  with  L  =  14 

For  figure  details  see  section  0.4.  The  system  learns  the  time  sequence. 


Figure  64:  Spike,  feedback  layer:  moving  bar  input  with  L  =  18 

For  figure  details  see  section  0.4.  The  system  learns  something,  but  not  the 
sequence,  because  the  time  window  is  too  large. 
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Feedforward  layer 

We  investigated  the  purely  feedforward  network  with  random  input  and  linearly 
decreasing  kernel. 

At  first  we  used  the  probabilistic  firing  model  with  fixed  average  number  of 
firing  neurons  and  modified  r  and  0.  We  have  found  the  following  regions: 

•  if  r  <  .5  the  net  will  have  random  statistics  (Figure  71). 

•  if  r  >  .5  and  6  <  .2  then  the  net  will  have  approximately  as  many  hori¬ 
zontal  rows  as  many  neurons  can  fire,  i.e.  N  *  9.  (Figures  67  -  70).  The 
strength  of  rows  is  increasing  with  the  increasing  of  the  size  of  time  window 
(Figures  72,  73). 

We  used  the  default  parameter  values  of  Table  2  except  for  L  =  2,  r  and  8. 
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Figure  72:  Spike,  feedforward  layer:  random  input  with  r  =  0.7  and 
9  =  0.02  and  L  =  10 

For  figure  details  see  section  0.4.  Now  the  kernel  is  large  enough  to  generate 
rows. 
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Figure  73:  Spike,  feedforward  layer:  random  input  with  r  =  0.7  and 
9  =  0.1  and  L  =  10 

For  figure  details  see  section  0.4.  Now  the  kernel  is  large  enough  to  generate 
rows. 
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Next  we  studied  the  simple  threshold  firing  model.  Depending  on  r  and  9 
there  are  several  possible  cases  -  with  9  =  0.5  and  r  =  0.9  the  system  is  cleared, 
all  weights  were  0.  See  Figures  74  -  82.  The  finite  input  variable  must  be  set 
1  to  produce  this  results,  otherwise  the  net  had  random  statistics.  We  used  the 
default  parameter  values  of  Table  2  except  for  L  =  10,  r  and  9. 
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Figure  81:  Spike,  feedforward  layer:  random  input  with  r  =  0 
9  =  0.5. 

For  figure  details  see  section  0.4. 


Summary 

Conclusion  on  a,  (3 

•  If  the  order  of  values  in  weight  matrix  is  in  the  order  of  a  (/3)  then  the 
system  can  not  learn  anything,  become  stabilized,  independently  from  a 

m 

•  If  the  values  in  weight  matrix  are  larger  with  some  orders  than  a  (/3)  and 
the  system  starts  to  oscillate  then  by  decreasing  a  (/3)  the  system  can  be 
stabilized 
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Abstract.  Recently,  a  novel  algorithm,  Stage,  has  been  proposed  for  op¬ 
timization  problems.  Stage  exhibits  impelling  performance  on  a  variety  of 
tasks.  Stage  makes  use  of  tunable  function  approximator  (e.g.,  an  artificial 
neural  network,  Fapp)  to  improve  local  search  policies;  and  it  can  be  seen  as 
an  efficient  (and  biased)  model  of  reinforcement  learning.  However,  Stage  also 
has  some  disadvantages:  it  can  be  unstable,  and  it  is  a  serial  (i.e. ,  non-parallel) 
technique.  Another  line  of  recent  studies  offer  a  solution  here.  These  studies 
show'  that  for  NP-complete  problems  randomization  of  starting  points  is  ad¬ 
vantageous.  Here,  the  combined  technique  of  genetic  algorithm  (GA)  using 
local  searches  and  Stage.  We  have  designed  a  three  stage  memory  procedure 
-  called  Stagenis  in  the  search  problem:  (i)  local  search  forms  the  short-term 
memory,  (ii)  Stage  is  the  medium-term  memory,  which  aims  to  uncover  the 
underlying  global  structure  of  the  search  problem,  (iii)  GA  is  the  long-term 
memory,  which  stabilizes  and  selects  found  structures  and/or  schemas  of  those 
structures.  In  this  algorithm  Stage  is  embedded  into  the  GA  using  a  new 
genetic  operator,  called  Stagenis.  Demonstrations,  which  show  the  combined 
advantages  of  these  algorithms,  are  provided  on  benchmarks  of  the  Boolean 
satisfiability  problem.  Connection  to  the  NFL  theorem  is  made. 

Another  direction  of  research  is  concerned  by  developing  Fapps  for  fast 
evaluation  of  a  family  of  optimization  problems.  It  is  shown  that  such  Fapps 
can  be  developed,  and  gains  in  speed  of  optimization  can  be  achieved.  In  par¬ 
ticular,  large  gains  can  be  achieved  for  satisfiability  problems  having  hard  and 
soft  constraints,  where  some  of  the  constraints  are  mandatory,  whereas  others 
are  only  desirable  constraints.  The  goal  is  to  find  rational  choices  fast.  It 
was  found,  too,  that  the  development  of  Fapps  for  families  of  problems  with 
unknown  families  is  not  possible  without  specific  further  improvement  of  the 
algorithms.  In  particular,  the  winner-takes-all  algorithm  is  not  capable  for 
partitioning  the  problem  state  into  problem  clusters  with  different  Fapps  be¬ 
longing  to  each  cluster.  Our  results  pinpoint  to  the  biased  nature  of  the  search 
applied  by  the  global  optimization  scheme  Stage.  We  shall  argue  that  Stage 
or  Stagenis  can  be  augmented  by  a  GA  on  the  top.  In  turn,  the  suggested 
four  stage  algorithm  would  have  a  GA  not  to  loose  the  best  solutions,  Fapps 
to  find  the  partitioning  of  the  problem  state  into  clusters  of  similar  problems 
and  a  GA  for  the  optimization  of  the  low-dimensional  parameter  space  of  the 
Fapps  to  overcome  the  biased  nature  of  algorithm  Stage.  Computer  demon¬ 
strations  of  this  suggestion  are  outside  of  the  scope  of  the  present  project  - 
where,  according  to  the  basic  assumptions,  clusters  of  satisfiability  problems 
can  be  identified. 
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1.  Introduction 

Several  engineering  applications  involve  NP- hard  optimization  problems,  in  which 
a  heuristic  is  used  to  find  the  optimum  or  near-optimum  of  a  cost  or  objective  func¬ 
tion.  (It  will  be  assumed  throughout  the  paper  that  a  function  Obj  :  X  — >  M  is  given 
on  the  state  space  X ,  and  the  goal  is  to  find  the  minimum  of  Obj.)  There  are  many 
approximation  techniques  aiming  to  overcome  time  requirements  of  exhaustive 
searches.  For  a  recent  review  of  such  approximations  see,  e.g.,  [Hochbaum,  1995]. 

Recently,  a  novel  technique  called  Stage  using  function  approximators  (FAPPs), 
e.g.,  artificial  neural  networks,  and  a  specific  approximation  of  reinforcement  learn¬ 
ing  (RL,  [Sutton  and  Bart.o,  1998])  has  been  suggested  [Boyan,  1998,  Boyan  and  Moore,  2000]. 
Stage,  indeed,  showed  attractive  properties  in  our  studies  on  real-world  engineer¬ 
ing  problems,  too  [Palot.ai  et  al.,  2001,  Ziegler  et  ah,  2001b]. 

Stage  was  able  to  solve  hard  problems,  but  it  had  its  weaknesses.  Stage  can 
often  recognize  the  structure  of  the  state  space  and  make  use  of  this  structure  to 
guide  the  search  to  better  regions  of  the  search  space  quickly.  However,  it  can 
sometimes  be  unstable  and  unreliable.  Also,  Stage  is  not  a  parallel  technique. 

This  motivates  the  development  of  our  new  algorithm,  which  combines  genetic 
algorithm  (GA)  and  Stage.  In  this  version  of  GA,  called  GA  with  Stagenis  (GAw- 
Stagenis,  or  simply  GAS)  ,  the  fitness  of  the  individuals  is  determined  as  the  best 
value  found  by  a  local  search  (LS)  starting  from  the  state  the  individual  represents. 

New  individuals  are  introduced  by  the  standard  crossover  operator  of  GA  and  by 
a  new  genetic  operator  ‘Stagenis’  (ie.,  STAGE-assisted  genesis).  Mutation  is  cov¬ 
ered  by  this  Stagenis  operation.  Stage  makes  use  a  FAPP  to  approximate  and 
smoothen  the  objective  function.  GAS  makes  use  of  the  same  technique.  In  the 
Stagenis  operation  first  a  LS  is  executed  on  the  FAPP  and  then  an  individual  is 
generated  at  the  minimum  of  this  FAPP1. 

Several  tests  and  measurements  have  been  conducted  to  evaluate  the  perfor¬ 
mance  of  the  new  algorithm.  We  used  Boolean  satisfiability  benchmarks  from  the 
DIMACS  archive  [DIM,  1992].  As  local  search  policy,  the  algorithm  WalkSat  was 
used  [Selman  et  al.,  1996]. 

Another  aspect  of  our  work  concerns  experimental  justification  of  recent  find¬ 
ings  on  NP-hard  problems  [Gomes  et  al.,  1998,  Gomes  et  al.,  2000].  We  show  that 
the  approximated  objective  function  improves  performance  of  the  randomization 
technique  in  this  benchmark  problem  set.  We  shall  consider  this  randomization 
approach,  Stage,  and  Stagenis  from  the  point  of  view  of  the  No  Free  Lunch  (NFL) 
theorem  [Wolpert  and  Macready,  1997]. 

Report  organization:  In  section  2  we  first  present  the  base  algorithms,  i.e.  Walk- 
Sat,  Stage  and  GA,  laying  special  emphasis  on  their  advantages  and  disadvantages 
that  motivated  our  work.  Section  3  describes  how  these  algorithms  are  combined 
to  form  GAS.  Rational  choice  considerations  are  provided  in  Section  4.  Imple¬ 
mentation  details  as  well  as  the  results  of  our  empirical  evaluation  are  covered  in 
Section  5.  In  the  discussion  section  (Section  6)  we  shall  review  the  properties  of  our 


1A  mapping  which  renders  an  action  to  a  state  of  the  state-space  in  a  general  search  problem, 
e.g.,  a  mapping  which  renders  an  operation  to  a  population  in  GA  can  be  recast  in  the  the  form 
of  ‘policy’,  the  basic  concept  of  RL  [Sutton  and  Barto,  1998].  Oftentimes,  we  shall  use  the  word 
policy  to  refer  to  such  mapping. 
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approach.  Section  7  concludes  the  report.  The  Appendix  contains  the  pseudo-code 
of  some  of  the  algorithms  of  the  paper. 

2.  The  base  algorithms:  WalkSat,  Stage  and  GA 

One  of  the  most  successful  local  search  methods  for  solving  satisfiability  problems 
is  WalkSat  [Selman  et  al.,  1996].  Therefore,  we  used  it  in  all  of  our  algorithms  as 
LS  policy. 

Boolean  satisfiability  problems  are  often  stated  in  Conjunctive  Normal  Form 
(CNF).  A  CNF  contains  elementary  expressions  made  of  singe  Boolean  variables 
that  might  be  negated.  Elementary  operations  are  grouped  in  clauses  by  means  of 
logical  OR  operators.  The  CNF  formula  contains  such  clauses  connected  by  logical 
AND  operators.  The  CNF  is  sat.isfiable,  if  there  is  a  consistent  assignment  of  truth 
values  to  all  the  Boolean  variables  of  the  CNF  that  makes  the  whole  CNF  formula 
‘true’.  This  also  implies  that  all  individual  clauses  of  the  CNF  should  be  ‘true’,  i.e., 
satisfied. 

Given  a  CNF  formula,  WalkSat  conducts  a  random  walk  in  the  space  of  possible 
truth  assignments.  The  aim  is  to  minimize 

Obj(x)  =  number  of  clauses  unsatisfied  by  assignment  x. 

WalkSat  works  in  rounds.  It  starts  a  round  by  selecting  an  unsatisfied  clause 
randomly,  and  tries  to  flip  each  variable  in  this  clause,  one  at  a  time.  It  selects  the 
best  change  greedily  according  to  the  overall  improvement  of  Obj.  If  no  improve¬ 
ment  is  possible,  then  it  selects  stochastically  either  the  least  worsening  step,  or  a 
random  step.  If  the  worsening  step  exceeds  a  limit  then  another  random  unsatisfied 
clause  is  tried.  Details  on  WalkSat  are  given  in  the  Appendix. 

WalkSat  uses  4  parameters;  the  default  values  are  chosen  according  to  [Boyan,  1998] 

Cutoff:  The  probability  of  stopping  after  each  round.  This  parameter  de¬ 
termines  the  expected  average  length  of  the  trajectories,  which  is  1  /cutoff. 
The  default  cutoff  is  the  reciprocal  value  of  the  number  of  clauses,  thus  the 
expected  average  length  of  the  trajectories  equals  the  number  of  clauses. 

Patience:  The  number  of  trials  to  find  a  next  step  suitable  for  starting  a  new 
round.  If  the  state  does  not  change  for  patience  steps,  then  the  algorithm 
terminates.  The  default  value  is  100. 

Noise:  If  no  improving  step  could  be  found,  then  with  this  probability  a 
random  step  is  chosen.  The  default  value  is  0.25. 

Delta:  The  maximum  allowed  worsening  of  Obj  if  no  improving  step  is  pos¬ 
sible.  If  this  parameter  is  set  to  0,  then  WalkSat  is  not  able  to  leave  local 
minima,  thus  resorting  to  a  true  local  search  behavior.  The  default  value 
is  10. 

In  our  implementation,  the  original  GA  ( plainGA )  consists  of  a  population  P 
of  N  individuals,  where  each  individual  is  an  element  of  X.  In  each  cycle,  a  new 
population  is  made  from  the  old  one  using  three  genetic  operations:  recombination, 
selection,  and  mutation.  First,  some  pairs  of  individuals  are  selected  for  recombina¬ 
tion  with  the  roulette  method  (i.e.  the  probability  for  the  selection  is  proportional 
with  Obj);  from  each  pair  two  individuals  of  the  new  population  are  created  using 
simple  cross-over.  The  rest  of  the  new  population  is  filled  up  by  keeping  the  best 
individuals  of  the  old  population.  Mutation  is  performed  either  by  altering  some 
randomly  chosen  genes  of  some  randomly  chosen  individuals  of  the  new  population 
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or  by  replacing  some  individuals  with  random  elements  of  X.  At  the  end  of  the 
cycle,  the  old  population  is  discarded,  and  the  recently  created  population  becomes 
the  old  population. 

The  fitness  of  an  individual  x  can  either  be  determined  by  Obj(x)  or,  alterna¬ 
tively,  by  the  result  of  a  LS  starting  from  x.  Using  the  latter  approach,  GA  exploits 
and  remembers  the  results  of  a  particular  LS  policy  7 r.  Also,  LS  policy  7r  improves 
the  efficiency  of  GA  by  releasing  the  load  imposed  by  precision  [Ulder  et,  al.,  1994, 
Kammeyer  and  Belew,  1996,  Freisleben  and  Merz,  1997,  Dome  and  Hao,  1998]. 

Stage,  [Boyan,  1998]  like  GA,  works  in  cycles.  Stage  has  an  inherited  LS  policy 
7T5,  builds  (corrects)  a  FAPP  from  the  result(s)  of  the  LS,  makes  use  of  its  own  LS 
policy  on  this  FAPP,  called  FAPP-LS  or  FLS.  The  result  of  the  FLS  is  subject  to 
LS  policy  7rg,  and  so  on.  The  FAPP  is  constructed  followed  by  the  retraining  of  the 
FAPP  using  the  trajectory  data  from  the  recent  LS  run,  then  another  local  search 
on  the  freshly  retrained  FAPP  .  The  FLS  will  produce  a  promising  “smart  restart” 
(SR)  point  for  7Tg: 


START 


perform  produces  retrain 

LS  — - >  FAPP 

trajectory 

I  I 

perform 

Smart  Restart  < -  FLS 


The  FAPP  is  computed  to  approximate  the  ‘value’  of  states  given  LS  policy  7 rg: 
Any  state  is  worth  to  the  value  that  can  be  reached  from  that  point  using  LS  policy 
7 rg.  Such  state- ‘value’  pairs  are  created  upon  each  LS  and  are  used  to  approximate 
the  ‘value-function’  by  means  of  a  FAPP.  This  FAPP  depends  on  the  LS  policy  and 
it  is  not  subject  to  improvements.  The  FLS  policy  is  subject  to  iteration2. 

If  the  same  SR  is  found  more  than  a  pre-specified  number  of  times,  the  algorithm 
resorts  to  random  restarts  (RR)  to  maintain  exploration.  The  FAPP  could  be  an 
artificial  neural  network,  but  for  the  sake  of  simplicity  and  efficiency,  we  used  a 
simple  quadratic  approximator  [Boyan,  1998].  Stage  can  make  use  of  features, 
he.,  it  can  perform  the  approximation  in  a  space  of  lower  dimension  (the  feature 
space) . 

The  advantages  of  Stage  are  manifold: 

•  No  a  priori  knowledge  of  the  problem  is  necessary,  i.e.  any  features,  7 rg 
policy,  function  approximator  and  FLS  policy  can  be  used. 

•  If  a  priori  knowledge  is  available,  it  can  be  incorporated  into  the  LS  strategy 
(i.e.,  into  the  concept  of  neighbors)  or  into  the  form  of  features  (condensed 
forms  of  regularities,  or  invariances).  Features  may  lead  to  an  effective 
reduction  of  the  dimension  of  the  search  space.  This  property  of  Stage 
can  be  advantageous  to  overcome  complexity  and  the  dimensionality  of  the 
search  problem.  Even  very  simple  features  -  such  as  the  mean  and  variance 
of  state  variables  -  can  be  used  effectively. 

•  Stage  weights  the  features  automatically,  i.e.,  it  can  select  the  relevant 
features  and  discard  unimportant  ones. 


2Note  that  Stage  makes  use  of  RL  terminology.  The  corresponding  policy  iteration,  however, 
may  not  always  satisfy  convergence  theorems  of  RL.  For  this  reason,  the  approximated  ‘value- 
function’  will  be  called  ‘FAPP,  which  generates  smart  restarts’. 
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Par3  2- 1  -c-shortLS 


Figure  1.  Instability  of  Stage 


•  If  there  is  a  structure  in  the  objective  function,  then  Stage  may  unravel 
this  structure  and  may  make  use  of  it. 

•  If  no  structure  can  be  found,  Stage  gracefully  degrades  to  a  simple  random 
restart  strategy. 

However,  Stage  also  has  a  couple  of  disadvantages: 

•  It  is  unstable,  i.e.  if  the  structure  of  the  LS  smoothened  objective  function  is 
more  complex  than  the  class  of  functions  that  the  FAPP  can  approximate, 
then  its  predictions  may  oscillate  wildly  and  chaotically  as  it  has  been 
experienced  by  the  authors  themselves  in  some  engineering  applications. 

•  It  can  definitively  drift  away  from  already  found  good  states.  Because  the 
memory  of  Stage  is  limited  to  the  FAPP,  which  is  a  restricted  in  storage, 
Stage  sometimes  completely  forgets  the  best  state  it  found  and  drifts  to 
worse  regions  of  the  state  space.  This  phenomenon  is  caused  by  the  fast  but 
biased  approach  of  Stage:  An  SR,  for  example,  may  start  a  long  trajectory 
having  a  sub-optimal  local  minimum.  The  long  trajectory  may  give  rise  to 
significant  modifications  in  the  FAPP  and  subsequent  SRs  may  be  trapped 
in  the  attractor  region  of  this  local  minimum.  As  an  example,  consider 
the  run  of  Stage  in  Figure  1.  The  horizontal  axis  corresponds  to  the 
sequence  of  LS  runs,  the  vertical  axis  shows  the  found  best  objective  values 
of  the  individual  LS  runs.  (Discrete  dots  are  interconnected  to  enhance 
visibility.)  The  first.  LS  runs  produced  results  around  100,  then  the  FAPP 
learned  the  structure  of  the  local  optima  and  gave  better  and  better  SRs. 
After  the  23rd  LS  run,  the  FAPP  stabilized  at  Obj  =  13,  and  a  semi- 
stat.ionary  operation  went  until  the  83rd  run.  At  this  point,  however,  the 
FAPP  began  a  transient  behavior,  during  which  Obj  temporarily  worsened 
to  528.  After  the  transient  was  gone  at  the  96th  run,  the  final  stationary 
operation  began.  No  further  LS  found  the  previous  optimum  anymore. 
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Stage  stagnated  around  objective  values  of  20. ..22  until  the  program  was 
stopped  after  106  evaluations. 

•  Even  if  structure  is  present,  it  may  remain  unobserved,  if  the  FAPP  is 
unable  to  represent  it.  Therefore,  the  choice  of  the  FAPP  is  important. 

•  Stage  is  inherently  serial.  The  FAPP  is  retrained  when  the  LS  is  completed 
(because  retraining  requires  the  best  value  found  during  the  LS),  the  FLS 
starts  when  retraining  is  completed,  and  the  LS  starts  when  the  completion 
of  the  FLS  generates  a  restart  point.  Therefore,  Stage  is  not  parallel  in 
its  current  form. 

On  the  other  hand,  genetic  algorithms  have  the  following  advantages: 

•  They  are  robust  to  noise  and  transients.  Since  the  population  can  store 
past  results,  and  thus  the  new  population  depends  not  only  on  the  current 
state,  but  also  on  the  past,  GA  performs  a  kind  of  temporal  integration 
during  the  search.  Therefore,  GA  can  be  made  robust  against  noise  and 
infrequent  transients.  GA  has  the  potential  to  stay  in  good  regions  of  X. 

•  They  are  proven  to  converge  in  the  limit  and  to  make  use  of  implicit  par¬ 
allelism.  (See  e.g.  [Baum  et  al.,  1995]  and  references  therein.) 

•  They  can  be  applied  to  any  optimization  problem,  without  any  a  priori 
knowledge. 

However,  they  have  the  following  disadvantages: 

•  The  effectiveness  depends  heavily  on  the  encoding  of  states.  Most  notably, 
the  recombination  operation  works  effectively  only  if  the  encoding  enables 
the  identification  of  (loosely  interacting)  modules  in  the  genom  code. 

•  The  effectiveness  depends  heavily  on  parameter  settings. 

•  GA  often  converges  very  slowly  because  of  the  high  number  of  unnecessary 
computations  (which  may  be  partly  attributed  to  also  following  not  really 
promising  directions).  This  is  a  negative  side-effect  of  the  integration  in 
time  domain,  and  thus  the  price  for  robustness. 

•  The  population  often  gets  degenerated,  i.e.  the  population  loses  its  diver¬ 
sity.  In  this  case,  some  good  but  not  necessarily  optimal  traits  spread  across 
the  whole  population,  preventing  further  exploration. 

Some  of  the  disadvantages  of  GA  can  be  compensated  using  LSs  [Ulder  et  ah,  1994, 
Kammeyer  and  Belew,  1996,  Freisleben  and  Merz,  1997,  Dome  and  Hao,  1998],  as 
it  has  been  mentioned  before.  This  change  can  boost  the  performance  of  GA.  This  is 
in  accordance  with  recent  results  of  Gomes  et  al  [Gomes  et  al.,  1998,  Gomes,  2000, 
Gomes  et  al.,  2000],  stating  that  in  combinatorial  search  problems,  ‘local  search’ 
strategies  (WalkSat,  for  example)  benefit  from  frequent  random  restarts  and  many 
short  parallel  runs,  especially  in  the  case  of  heavy-tailed  runtime  distributions  that 
are  typical  of  ./VP-complete  problems.  Therefore,  GA,  which  starts  by  launching 
parallel  random  LS  runs  can  be  advantageous  in  NP  complete  problems. 

3.  GA  with  Stagenis  (GAS) 

The  advantages  and  disadvantages  of  Stage  and  GA  are  largely  complementary 
to  each  other,  so  that  the  unification  of  the  two  algorithms  can  be  expected  to  be 
stable  and  parallel  as  GA,  and  smart  as  Stage.  On  the  other  hand,  since  genetic 
algorithms  are  bound  to  give  a  chance  to  less  promising  individuals  as  well  (which 
is  the  source  of  their  robustness),  they  tend  to  perform  unnecessary  computations, 
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making  GAS  somewhat  slower  than  Stage.  Of  course,  this  can  be  compensated  in 
the  case  of  GAS  if  enough  parallel  processors  are  available. 

The  combination  of  the  two  algorithms  was  realized  by  embedding  Stage  into 
the  GA  as  a  new  genetic  operation.  That  is,  the  genetic  algorithm  is  kept  as  a 
framework,  but  a  FAPP  is  trained  with  the  trajectories  of  the  LSs  that  are  used  to 
calculate  the  fitness  of  the  new  individuals.  FLS  is  used  to  obtain  new  SR  states. 

These  new  SR  states  are  injected  into  the  population.  We  call  this  operation  Sta- 
genis  (STAGE-assisted  genesis).  The  population  can  be  regarded  as  a  container  for 
the  starting  points  of  LSs.  This  solves  the  problem  that.  Stage  sometimes  ‘forgets’ 
good  solutions  (see  section  2),  and  thus  stability  can  be  improved.  Moreover,  the 
standard  genetic  operations  also  help  optimizing  the  starting  points:  recombina¬ 
tion  generates  new  individuals  from  the  old  ones,  and  selection  makes  sure  that,  only 
those  individuals  will  be  kept,  from  which  the  most  successful  LSs  can  be  started. 
Mutation  was  discarded  in  favor  of  St.agenis  because  SRs  (or,  in  the  worst  case, 

RRs)  suffice  to  guarantee  variety. 

GAS  works  as  follows.  First,  a  population  P  of  N  individuals  is  created.  These 
individuals  are  partly  created  randomly,  after  which  their  fitness  is  calculated  by 
conducting  a  LS  starting  from  them.  The  LS  trajectories  are  then  used  to  train  the 
FAPP,  and  the  remaining  individuals  are  created  using  St.agenis. 

In  later  cycles,  the  new  population  is  always  constructed  from  the  old  one  by 
using  the  operations:  recombination,  selection,  and  Stagenis.  Recombination  and 
Stagenis  create  new  individuals,  each  resulting  in  a  LS  and  the  retraining  of  the 
FAPP.  In  our  first,  implementation,  we  used  Stagenis  only  for  the  creation  of  the 
last,  individuals  of  the  new  population.  However,  it.  turned  out.  to  be  useful  to  make 
use  of  SRs  as  early  as  possible.  Therefore,  the  Stagenis  calls  are  distributed  evenly 
in  the  code  that  generates  the  new  population. 

For  the  sake  of  efficiency,  the  FAPP  is  retrained  as  late  as  possible,  i.e.  when 
a  new  individual  has  to  be  created  with  Stagenis.  At  this  time,  all  the  collected 
information  will  be  used  by  the  FAPP  to  create  the  new  Stagenis  member.  The 
trajectories  are  actually  cached  until  that,  moment,  and  the  FAPP  is  retrained  with 
all  recent  trajectories  at.  once. 

More  details  on  the  implementation  are  given  in  our  technical  report.  [Ziegler  et.  ah,  2001a]. 
The  pseudo-code  of  GAS  can  be  found  in  the  Appendix. 

Parallel  form  Stage:  Algorithm  GAS.  One  important  aspect  of  GAS  is  its 
parallel  form,  which  was  one  of  its  most  important  motivations.  In  the  following 
paragraphs  we  describe  one  (out.  of  many)  possible  scheme. 

Our  GA  implementation  is  inherently  parallel  because  the  LS  runs  -  that  make 
almost  all  of  the  execution  time  -  are  completely  independent  and  can  thus  be 
executed  by  different  processors,  yielding  a  practically  linear  speed-up.  On  the 
other  hand,  Stage  is  inherently  serial;  therefore,  its  parallel  formulation  is  not 
completely  straight-forward.  Of  course,  the  bottleneck  is  the  FAPP,  because  it.  is  a 
central  object,  that,  nevertheless  has  to  be  retrained  by  many  processes,  and  is  also 
used  to  generate  new  SRs  using  the  FLS. 

Fortunately,  FLSs  can  be  made  parallel  just,  as  well  as  LSs  because  they  require 
only  read-only  access  to  the  FAPP,  which  can  be  granted  to  any  number  of  processes 
simultaneously.  We  propose  to  have  an  extra  copy  of  the  FAPP  which  is  always 
available  for  reading.  This  copy  is  updated  in  an  atomic  step  after  each  retraining 
in  order  to  guarantee  that  it  contains  the  latest,  information. 
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Therefore,  the  only  problem  is  the  retraining  of  the  FAPP,  because  this  definitely 
requires  exclusive  write  access.  The  main  idea  is  to  minimize  the  time  of  the 
exclusive  write  access.  The  problem  is  that  uploading  the  trajectory  data  -  which 
can  easily  be  tens  or  hundreds  of  kilobytes  -  to  the  FAPP  can  take  long  time. 
Instead,  we  suggest  that  a  copy  of  the  FAPP  should  be  downloaded,  edited  locally, 
and  then  uploaded.  The  FAPP  is  usually  a  concise  representation;  for  instance  in 
the  case  of  an  incremental  FAPP  (such  as  linear  architecture  quadratic  regression 
[Boyan,  1998]),  it  is  a  constant  size  matrix.  This  way,  the  time  interval  in  which  no 
other  write  access  is  allowed,  can  by  very  short.  Furthermore,  the  processes  could 
pass  the  FAPP  object  to  each  other  in  a  predefined  order  (assuming  a  kind  of  ring 
topology),  so  that  the  uploading  of  the  FAPP  and  the  next  downloading  is  actually 
only  one  step  of  data  transfer.  Figure  2  shows  the  GAS  algorithm  in  the  proposed 
architecture  for  3  processors. 

In  the  following  calculations,  we  will  use  tRlts,tRs  according  to  Figure  2.  tLR 
means  the  total  execution  time  of  several  LS  and  FLS  runs  that  should  be  made 
without  retraining  the  FAPP  by  that  particular  processor,  ts  gives  the  time  nec¬ 
essary  to  send  (or  receive)  the  FAPP  object.  tR  amounts  to  the  time  needed  for 
retrain  the  FAPP.  Let  k  denote  the  number  of  processors  that  can  be  used  parallel; 
then  the  following  inequality  must  hold  for  k : 

2 ts  +  t-LS  >  {k  —  1)  •  tR  +  k  ■  ts 


that  is 
(1) 


1  t-LS  +  ts 
tR  +  ts 


>  k 


If  the  processes  use  shared  memory  to  communicate  (in  which  case  the  FAPP  object 
does  not  have  to  be  passed  physically  between  the  processes)  or  the  communication 
time  between  the  processes  is  negligible  compared  to  the  retraining  time  or  the 
execution  time  of  the  LS  runs,  i.e.  ,  ts  — >  0,  then  (1)  transforms  to: 


(2) 


>  k 


That  is,  the  number  of  usable  processors  is  bounded  by  the  ratio  of  the  execution 
time  of  the  batched  LS/FLS  runs  and  the  retraining  time.  tR  is  determined  by 
the  type  and  size  of  the  FAPP,  but  tLg  can  be  chosen  large  enough  to  allow  the 
use  of  an  arbitrary  number  of  processors  and  achieve  almost  linear  speed-up.  As  a 
consequence,  the  trajectory  of  an  LS  run  will  not  be  used  immediately  to  retrain 
the  FAPP,  but  in  the  worst  case  only  after  fgg  time. 

On  the  other  hand,  if  the  number  of  processors  available  is  given  and  the  optimal 
number  of  batched  LS/FLS  runs,  i.e.  ,  f^g,  should  be  determined,  (2)  can  be 
rephrased: 


(3)  t.LS  >  (k  —  1)  ■  tR 

To  use  the  trajectories  of  the  LS  runs  as  soon  as  possible,  tRs  should  be  chosen  as 
small  as  possible  but  satisfying  (3). 


4.  Considerations  on  rational  choice 

In  many  cases,  it  is  possible  that  we  can  formulate  constraints  in  the  form  of 
soft  satisfiable  instances.  The  easiest  is  to  provide  cost  for  every  clause  and  to 
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Figure  2.  Parallel  execution  of  the  GAS  algorithm  in  the  pro¬ 
posed  architecture  for  3  processors 

search  for  low  cost  solutions.  Hard  constraints  will  have  to  exhibit  costs,  which  are 
above  threshold ,  whereas  soft  constraints  may  assume  real  costs  in  terms  of  dollars, 
or  time.  We  have  studied  such  examples  for  Stage  and  for  Fapps  developed  on 
different  problem  families.  Fapps  of  different  problem  families  were  cross  checked 
against  each  other.  Such  optimization  should  be  able  to  find  rational  choices  much 
faster  than  optimal  solutions. 

The  particular  feature  set  of  our  problems  were  chosen  as  follows: 

•  proportion  of  clauses  currently  unsatisfied 

•  proportion  of  clauses  satisfied  by  exactly  1  variable 

•  proportion  of  clauses  satisfied  by  exactly  2  variable 

•  proportion  of  variables  that  would  break  a  clause  if  flipped 

•  proportion  of  the  variable  set  identical  to  their  initial  ‘naive’  setting 

Experiments  on  ‘rational  choice’  can  be  found  in  Subsection  5.2. 

5.  Empirical  evaluation 

5.1.  Experimenting  with  Stagenis.  A  test-program  was  implemented,  which 
was  able  to  solve  satisfiability  problems  with  different  heuristic  algorithms.  These 
algorithms  include  i)  Stage  using  WalkSat  as  local  search;  ii)  GAS  as  described  in 
Section  3;  Hi)  a  special  version  of  genetic  algorithm,  called  GAwRR,  using  random 
restarts  as  a  special  mutation  operator;  iv)  and  for  the  sake  of  comparison  the 
pure  WalkSat  algorithm  with  multiple  random  restarts  (see  the  Appendix  for  the 
pseudo-code  of  these  algorithms). 

These  algorithms  have  lots  of  inner  parameters  that  can  influence  their  perfor¬ 
mance  drastically.  It  was  not  intended  to  fine-tune  them  for  the  actual  problem 
(which  is,  in  fact,  another  search  problem),  so  most  of  these  parameters  were  set 
to  a  default  value  and  they  remained  unchanged  in  our  tests. 

One  exception  is  the  cutoff  parameter  of  the  WalkSat  algorithm,  where  we  tried 
two  versions.  Since  all  of  these  algorithms  use  WalkSat  as  their  inner  local  search 
heuristic,  it  is  an  input  parameter  for  the  whole  program.  The  first  version  is  called 
longLS,  where  cutoff  is  set  to  1  j number _of _clauses,  while  in  shortLS  it  is  set 
to  min(0.5, 100 /number _of  _clauses).  Some  of  the  tests  were  addressed  to  decide 
whether  shortLS  or  longLS  is  better  for  our  benchmark  problems. 
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The  remaining  parameters  were  configured  as  follows: 

•  The  delta  parameter  of  WalkSat  was  set  to  zero,  so  that  WalkSat  became 
a  simple  LS  strategy  unable  to  leave  local  optima.  The  patience  parameter 
was  set  to  100,  the  noise  was  set  to  0.25. 

•  In  the  GAwRR  and  GAS  algorithms,  the  recombination  rate  was  set  to 
35%  ( i.e .  70%  of  the  new  population  is  created  using  recombination),  the 
stagenis  rate  was  set  to  5%;  the  saved  best  part  of  the  population  was  hence 
25%i.  Another  idea  was  to  rely  more  on  the  FAPP,  so  —  in  a  separate  ex¬ 
periment,  —  we  increased  the  stagenis  rate  to  70%)  and  completely  omitted 
recombination.  This  version  was  called  GAwStagenis-no-recomb.  To  illus¬ 
trate  the  power  of  these  algorithms  we  also  ran  plainGA  ( stagenis  rate= 0% 
and  without  LS). 

•  Stage  used  a  quadratic  FAPP. 

To  be  able  to  compare  the  speed  and  effectiveness  of  these  algorithms,  we  introduced 
machine-independent  step-parameters: 

#Evals:  This  counter  is  the  number  of  evaluated  states  counted  separately 
both  for  LS  and  for  FLS. 

#LS:  This  counter  is  the  number  of  LS  runs,  counted  separately  both  for  the 
LS  and  for  FLS. 

#Gens:  This  counter  is  the  number  of  generations  for  GA  type  algorithms, 
or  the  number  of  cycles  for  Stage,  respectively. 

The  termination  criteria  of  the  program  can  be  explicitly  set  by  means  of  these 
counters.  In  most  cases,  107  . . .  10s  state  evaluations  (#Evals)  were  allowed. 

To  sum  up:  we  ran  the  following  algorithms:  Stage,  GAS,  GAwRR,  plainGA, 
GAS-no-recomb  and  WalkSat.  It  would  be  impossible  to  compress  the  results  of  all 
the  six  algorithms  in  one  table  or  figure,  so  in  most  cases  the  relevant  results  are 
shown  to  increase  visibility. 

In  the  case  of  the  harder  problems  none  of  the  algorithms  was  able  to  find  an 
exact  solution.  In  turn,  it  is  senseless  to  define  the  running  time  of  these  algorithms; 
they  were  stopped  after  a  predefined  number  of  steps.  However,  we  note  that,  it, 
takes  them  approximately  a  day  to  make  10s  steps  (i.e.  evaluated  states)  on  the 
largest  problems. 

For  the  tests,  the  satisfiability  benchmarks  from  the  DIMACS  archive  [DIM,  1992] 
were  used.  Only  satisfiable  instances  were  considered,  because  in  this  case  the  global 
optimum  is  known  in  advance.  In  the  case  of  unsat, isfiable  problems  it,  is  hard  to 
value  the  found  best  solution,  since  it,  should  be  compared  to  the  real  optimum, 
which  is  unknown. 

The  results  of  the  first,  mass  tests  can  be  seen  in  Table  2,  which  is  located  at  the 
end  of  the  paper,  due  to  its  size.  These  tests  consist,  of  both  easy  ( U16d2 ,  U32al, 
■U32c3,  U8a4 ,  U8b3 )  and  harder  ( J2000 ,  par  16,  par32 )  problems  to  detect  the  limits 
of  each  algorithm.  Note  that,  our  goal  is  to  develop  a  parallel  algorithm,  whereas 
test,  results  reflect,  performance  on  serial  machines.  It,  can  be  seen  that  the  simple 
WalkSat,  is  very  effective  on  smaller  problems,  although  it  can  fail  on  both  types. 
It,  can  be  stated  that,  WalkSat,  is  a  good  heuristic  for  solving  satisfiability  problems, 
so  it  is  worth  using  it,  in  the  more  complex  algorithms  as  LS  heuristic.  From  now 
on  we  focus  on  problems,  which  are  harder  for  WalkSat,  and  investigate  how  the 
other  algorithms  perform. 
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Table  1.  Summary  of  the  results  of  Stage,  GAS  and  plainGA 
on  problem  instances  par  16-1,  par  16-2 


STAGE  - 
shortLS 

GAS  - 
shortLS 

Problem 

BSF 

#Evals 

BSF 

#Evals 

parl6-l 

26 

1  •  107 

15 

6  •  10e 

par!6-2 

27 

1  •  107 

48 

1  •  107 

STAGE  - 
longLS 

GAS  - 
longLS 

Problem 

BSF 

#Evals 

BSF 

#Evals 

parl6-l 

6 

3  •  107 

11 

3- 107 

par!6-2 

5 

3  •  107 

8 

3  •  107 

plain  GA 

Problem 

BSF 

#Evals 

#Gens 

par  16-1 

298 

1  •  10e 

6097 

par  16-2 

288 

1  •  106 

6052 

Legend:  BSF  =  Obj  of  best  state  found,  #Evals  =  number  of  evaluated  states,  #gens  = 

number  of  generations 


It  was  not  unequivocal  whether  shortLS  or  longLS  is  better;  it  seems  to  depend 
on  the  current  problem  and  on  the  current  run.  Finding  the  optimal  length  of  the 
LS  runs  for  each  problem  was  beyond  the  scope  of  our  studies. 

In  most  cases  GAS  outperforms  GAwRR  after  some  generations,  showing  that 
GA  can  effectively  use  the  information  gained  from  the  FAPP.  It  can  be  seen  that 
Stage  and  GAS  produce  the  best  results  according  to  our  expectations.  A  possible 
explanation  why  GAS  is  not  better  than  Stage  could  be,  that  these  problems 
may  have  a  recognizable  structure  for  the  FAPP  and  Stage  uses  the  FAPP  more 
intensively  than  GAS  during  the  same  number  of  evaluated  states.  To  investigate  it 
further,  in  the  next  round  of  the  test  cases  we  thoroughly  examined  Stage,  GAS, 
and  PlainGA  on  two  relatively  hard  problems  (parl6-l  and  parl6-2). 

The  best  objectives  found  by  each  algorithm  are  summarized  in  Table  1.  The 
results  of  plainGA  were  at  least  an  order  of  magnitude  worse  than  those  of  the 
other  algorithms  indicating  that  this  problem  family  is  not  fortunate  for  GA  in  this 
state  encoding  and  both  LS  and  Stagenis  can  considerably  improve  it. 

To  better  understand  the  behavior  of  Stage  and  GAS,  consider  Figs.  3  and  4 
where  the  averaged  best-objective-so-far  values  and  the  best  and  worst  case  runs 
can  be  seen.  The  vertical  lines  indicate  the  first  5  generation  changes  to  picture  the 
length  of  a  generation  (which  is  approximately  constant  during  the  runs,  although 
it  seems  to  shorten  because  of  the  logarithmic-scale  diagram).  The  numbers  in 
parenthesis  mean  the  number  of  averaged  runs.  At  the  beginning  GAS  starts  better, 
especially  in  the  short.LS  case,  but  at  the  end  Stage  achieves  slightly  better  results 
in  the  averaged  values.  An  advantage  of  GAS  is  that  it  performs  more  reliably, 
i.e.  the  deviation  between  best  and  worst  runs  is  significantly  smaller  than  in  the 
case  of  Stage,  especially  around  104  . . .  105  evaluations. 
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Figure  3.  Comparison  of  average  performances  of  Stage,  GAS 
and  GAS-no-recomb  on  par  16-1.  The  birth  time  of  new 
generations  are  depicted  by  vertical  lines  (gen. step)  on  the 
upper  panel.  Lines  becomes  frequent  at  he  right  hand  side  of  the 
figure,  these  frequent  lines  are  not  shown. 

The  surprisingly  good  performance  of  Stage  led  us  to  try  GAS-no-recomb  in 
order  to  use  the  power  of  the  FAPP  more  effectively.  Unfortunately  the  results  do 
not  seem  to  confirm  this  idea.  It  can  be  seen  that  GAS-no-recomb  is  worse  than 
GAS,  hence  it  may  not  be  possible  to  raise  the  performance  of  GAS  by  simply 
increasing  the  Stagenis  rate. 

The  advantage  of  GAS  compared  to  Stage  is  that  it  is  in  parallel  form,  whereas 
Stage  is  inherently  sequential.  That  is,  although  Stage  and  GAS  perform  simi¬ 
larly,  the  results  of  GAS  could  also  be  reached  n  times  faster,  if  a  sufficient  number 
of  processors  is  available. 

5.2.  Experiments  on  ‘rational  choice’.  The  experiments  were  conducted  on  dif¬ 
ferent  problem  families.  Examples  are  from  CNF  categories  with  prescribed  number 
of  clauses  ( x )  and  prescribed  number  of  variables  (y)  downloaded  from  the  web  from 
site  ftp:/ / dimacs.rutgers.edu/pub /challenge/satisfiability/benchmarks/cnf/. 

The  files  of  the  first,  type  of  family  of  problems  on  the  web  are  specified  by  the 
notation  aim—  x  —  y  —  z  —  yesl  where  2  denotes  the  number  of  the  clause  within 
category,  yesl  denotes  that  the  clause  is  sat.isfiable.  4  examples  were  tested  in  each 
( x ,  y)  category.  Notation  used  in  the  figures:  a:  aim  x  =  200,  first  digit  =  y,  second 
digit  =  2,  third  digit  =  1  (yes),  fourth  digit  denotes  the  instance  number  of  the 
database. 

The  other  type  of  problem  family  is  on  parity  checking.  Here:  p  stands  for  ‘par8\ 
8  means  8  bit  problem,  presence  of  V  or  the  lack  of  ‘q*  refers  to  the  classes  of  the 
database,  last  digit  denotes  the  instance  number  of  the  database. 
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Averaged  Best  Objective  VI. 7  (longLS,  test  case:  pari 6-1) 


-A-  Stage  (2) 

GAwStagenis  default  (7) 

—  GAwStagenis  default  gen.step 
-y-  GAwStagenis  no-recomb.  (3) 

GAwStagenis  no-recomb,  gen.step  : 

:  |  |  i  i  :  ] 

Figure  4.  Comparison  of  best  case  and  worst  case  performances 
of  Stage,  GAS  and  GAS-no-recomb  on  parl6-l.  The  birth  time 
of  new  generations  are  depicted  by  vertical  lines  (gen.  step)  on 
the  upper  panel.  Lines  becomes  frequent  at  he  right  hand  side  of 
the  figure,  these  frequent  lines  are  not  shown. 
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5.3.  Experiments  with  threshold  12. 
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Figure  5.  Different  Fapps  tried  on  problem  aim— 200— 1— 
6— yesl  with  threshold  =12 

(a)-(d):  Four  different  problems,  aim-200-l-6-yesl-l  —  aim- 
200-l-6-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the 
corresponding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  = 
y ,  second  digit  =  2,  third  digit  =  1  (yes),  last  digit  denotes  the 
instance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  ‘par8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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Figure  6.  Different  Fapps  tried  on  problem  aim— 200— 2— 
0— yesl  with  threshold  =  12 

(a)-(d):  Four  different  problems,  aim-200-2-0-yesl-l  —  aim- 
200-2-0-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the 
corresponding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  = 
y ,  second  digit  =  z,  third  digit  =  1  (yes),  last  digit  denotes  the 
instance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  ‘par8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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Figure  7.  Different  Fapps  tried  on  problem  aim— 200— 3— 
4— yesl"  with  threshold  =  12 

(a)-(d):  Four  different  problems,  aim  200  3  4  yesl  1  —  aim- 
200-3-4-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the 
corresponding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  = 
y,  second  digit  =  z,  third  digit  =  1  (yes),  last  digit  denotes  the 
instance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  ‘par8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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Figure  8.  Different  Fapps  tried  on  problem  aim— 200— 6— 
0— yesl  with  threshold  =  12 

(a)-(d):  Four  different  problems,  aim-200-6-0-yesl-l  —  aim- 
200-6-0-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the 
corresponding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  = 
y,  second  digit  =  z,  third  digit  =  1  (yes),  last  digit  denotes  the 
instance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  lpar8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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FIGURE  9.  Different  Fapps  tried  on  problems  par8  and 
par8c  with  threshold  =  12 

(a)-(f):  Six  different  problems,  par8-l  —  par8-3.  Legend  of  the 
figure  refers  shows  the  codes  the  corresponding  Fapps.  (1)  circles 
‘a’:  aim  x  =  200,  first  digit  =  y,  second  digit  =  z.  third  digit  =  1 
(yes),  last  digit  denotes  the  instance  number  of  the  database.  (2) 
crosses  ‘p’  stands  for  ‘par8’,  ‘c’  or  the  lack  of  ‘c’  refers  to  the  classes 
of  the  database,  last  digit  denotes  the  instance  number  of  the  data¬ 
base. 
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5.4.  Experiments  with  threshold  7. 
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Figure  10.  Different  Fapps  tried  on  problem  aim— 200— 1— 
6— yesl  with  threshold  =  7 

(a)-(d):  Four  different  problems,  aim-200-l-6-yesl-l  —  aim- 
200-l-6-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the 
corresponding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  = 
y ,  second  digit  =  2,  third  digit  =  1  (yes),  last  digit  denotes  the 
instance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  ‘par8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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Figure  11.  Different  Fapps  tried  on  problem  aim— 200— 2— 
0— yesl  with  threshold  =  7 

(a)-(d):  Four  different  problems,  aim-200-2-0-yesl-l  —  aim- 
200-2-0-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the 
corresponding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  = 
y ,  second  digit  =  z,  third  digit  =  1  (yes),  last  digit  denotes  the 
instance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  ‘par8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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Figure  12.  Different  Fapps  tried  on  problem  aim— 200— 3— 
4— yesl  with  threshold  =  7 

(a)-(d):  Four  different  problems,  aim  200  3  4  yesl  1  —  aim- 
200-3-4-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the 
corresponding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  = 
y,  second  digit  =  z,  third  digit  =  1  (yes),  last  digit  denotes  the 
instance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  lpar8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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Figure  13.  Different  Fapps  tried  on  problem  aim— 200— 6— 
0— yesl  with  threshold  =  7 

(a)-(d):  Four  different  problems,  aim-200-6-0-yesl-l  —  aim- 
200-6-0-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the 
corresponding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  = 
y,  second  digit  =  z,  third  digit  =  1  (yes),  last  digit  denotes  the 
instance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  lpar8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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FIGURE  14.  Different  Fapps  tried  on  problems  par8  and 
par8c  with  threshold  =  7 

(a)-(f):  Six  different  problems,  par8-l  —  par8-3.  Legend  of  the 
figure  refers  shows  the  codes  the  corresponding  Fapps.  (1)  circles 
‘a’:  aim  x  =  200,  first  digit  =  y,  second  digit  =  0,  third  digit  =  1 
(yes),  last  digit  denotes  the  instance  number  of  the  database.  (2) 
crosses  ‘p’  stands  for  ‘par8’,  ‘c’  or  the  lack  of  ‘c’  refers  to  the  classes 
of  the  database,  last  digit  denotes  the  instance  number  of  the  data¬ 
base. 
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5.5.  Experiments  with  threshold  4. 
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Figure  15.  Different  Fapps  tried  on  problem  aim— 200— 1— 
6— yesl  with  threshold  =  4 

(a)-(d):  Four  different  problems,  aim-200-l-6-yesl-l  —  aim- 
200-l-6-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the 
corresponding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  = 
y ,  second  digit  =  2,  third  digit  =  1  (yes),  last  digit  denotes  the 
instance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  ‘par8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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Figure  16.  Different  Fapps  tried  on  problem  aim— 200— 2— 
0— yesl  with  threshold  =  4 

(a)-(d):  Four  different  problems,  aim  200  2  0  yesl  1  —  aim- 
200-2-0-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the 
corresponding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  = 
y,  second  digit  =  z,  third  digit  =  1  (yes),  last  digit  denotes  the 
instance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  ‘par8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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Figure  17.  Different  Fapps  tried  on  problem  aim— 200— 3— 
4— yesl  with  threshold  =  4 

(a)-(d):  Four  different  problems,  aim-200-3-4-yesl-l  —  aim- 
200-3-4-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the 
corresponding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  = 
y ,  second  digit  =  2,  third  digit  =  1  (yes),  last  digit  denotes  the 
instance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  ‘par8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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Figure  18.  Different  Fapps  tried  on  problem  aim— 200— 6— 
0— yesl  with  threshold  =  4 

(a)-(d):  Four  different  problems,  aim-200-6-0-yesl-l  —  aim- 
200-6-0-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the 
corresponding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  = 
y,  second  digit  =  z,  third  digit  =  1  (yes),  last  digit  denotes  the 
instance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  lpar8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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FIGURE  19.  Different  Fapps  tried  on  problems  par8  and 
par8c  with  threshold  =  4 

(a)-(f):  Six  different  problems,  par8-l  —  par8-3.  Legend  of  the 
figure  refers  shows  the  codes  the  corresponding  Fapps.  (1)  circles 
‘a’:  aim  x  =  200,  first  digit  =  y,  second  digit  =  0,  third  digit  =  1 
(yes),  last  digit  denotes  the  instance  number  of  the  database.  (2) 
crosses  ‘p’  stands  for  ‘par8’,  ‘c’  or  the  lack  of  ‘c’  refers  to  the  classes 
of  the  database,  last  digit  denotes  the  instance  number  of  the  data¬ 
base. 
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5.6.  Experiments  with  threshold  2. 
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Figure  20.  Different  Fapps  tried  on  problem  aim— 200— 1— 
6— yesl  with  threshold  =  2 

(a)-(d):  Four  different  problems,  aim  200  1  6  yesl  1  —  aim- 
200-l-6-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the 
corresponding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  = 
y.  second  digit  =  2,  third  digit  =  1  (yes),  last  digit  denotes  the 
instance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  ‘par8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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Figure  21.  Different  Fapps  tried  on  problem  aim— 200— 2— 
0— yesl"  with  threshold  =  2 

(a)-(d):  Four  different  problems,  aim-200-2-0-yesl-l  -  aim-200- 
2-0-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the  cor¬ 
responding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  =  y , 
second  digit  =  z,  third  digit  =  1  (yes),  last  digit  denotes  the  in¬ 
stance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  lpar8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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Figure  22.  Different  Fapps  tried  on  problem  aim— 200— 3— 
4— yesl  with  threshold  =  2 

(a)-(d):  Four  different  problems,  aim  200  3  4  yesl  1  —  aim- 
200-3-4-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the 
corresponding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  = 
y,  second  digit  =  2,  third  digit  =  1  (yes),  last  digit  denotes  the 
instance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  ‘par8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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Figure  23.  Different  Fapps  tried  on  problem  aim— 200— 6— 
0— yesl  with  threshold  =  2 

(a)-(d):  Four  different  problems,  aim-200-6-0-yesl-l  —  aim- 
200-6-0-yesl-4.  Legend  of  the  figure  refers  shows  the  codes  the 
corresponding  Fapps.  (1)  circles  ‘a’:  aim  x  =  200,  first  digit  = 
y,  second  digit  =  z,  third  digit  =  1  (yes),  last  digit  denotes  the 
instance  number  of  the  database.  (2)  crosses  ‘p’  stands  for  lpar8’, 
‘c’  or  the  lack  of  ‘c’  refers  to  the  classes  of  the  database,  last  digit 
denotes  the  instance  number  of  the  database. 
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FIGURE  24.  Different  Fapps  tried  on  problems  par8  and 
par8c  with  threshold  =  2 

(a)-(f):  Six  different  problems,  par8-l  —  par8-3.  Legend  of  the 
figure  refers  shows  the  codes  the  corresponding  Fapps.  (1)  circles 
‘a’:  aim  x  =  200,  first  digit  =  y,  second  digit  =  z,  third  digit  =  1 
(yes),  last  digit  denotes  the  instance  number  of  the  database.  (2) 
crosses  ‘p’  stands  for  ‘par8’,  ‘c’  or  the  lack  of  ‘c’  refers  to  the  classes 
of  the  database,  last  digit  denotes  the  instance  number  of  the  data¬ 
base. 
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6.  Discussion 

In  this  section  we  shall  relate  our  work  to  evolutionary  search  algorithms  (ESA) 
and  to  artificial  neural  networks.  ESA  can  be  traced  back  to  1953,  when  the 
Metropolis  Monte  Carlo  algorithm  was  published  N.  Metropolis,  A.  Rosenbluth, 
M.  Rosenbluth,  A.  Teller  and  E.  Teller.  This  work  considers  the  model  (our  a 
priori  knowledge  about  the  problem)  and  the  search  algorithm,  which  aims  to  find 
a  global  minima  given  our  knowledge.  This  search  algorithm  is  a  special  form  of 
hill-climbing,  called  simulated  annealing  (SA),  which  makes  use  of  neighborhoods 
and  selection.  Further  algorithms  were  developed  (interestingly,  sometimes  quite 
independently),  which  improved  on  the  original  ideas.  In  1975  Holland  published 
a  book,  in  which  he  introduced  genetic  algorithms  [Holland,  1975],  Rechenberg 
developed  evolutionary  strategies  [Rechenberg,  1973].  Ever  since,  a  large  body  of 
research  has  been  devoted  to  these  ideas.  For  an  recent  review,  see  [Sharpe,  2000], 
and  the  web  [cit,  ].  Interestingly,  it  seems  to  us  that,  Monte  Carlo  sampling  methods 
are  not  considered  as  part  of  this  field. 

Another  widely  used  optimization  technique  makes  use  of  a  special  form  of  func¬ 
tion  approximators  (FAPPs),  and  as  such  it  has  its  origin  at  the  very  beginning  of 
mathematics.  This  particular  area  of  FAPPs,  artificial  neural  networks  (ANNs)  fea¬ 
ture  local  learning  rules  and  has  started  earlier  than  ESA  [McCulloch  and  Pitts,  1943] 
An  excellent  introductory  book  to  this  algorithm  family  has  been  published  recently 
[Haykin,  1999], 

Below,  we  review  a  families  of  algorithms,  which  intend  to  unify  these  two  dis¬ 
tinct  routes.  One  specific  route  deals  with  the  selection  (this  is  an  ESA  term)  of 
weights  (this  is  an  ANN  term).  This  direction  of  algorithm  development  is  called 
evolving  artificial  neural  networks  (EANNs).  For  a  recent  review  on  EANNs,  see 
[Yao,  1999]). 

The  route  we  follow  is  seemingly  different.  In  our  approach  the  cost  function 
is  approximated  and  the  selective  algorithm  is  used  to  stabilize  multiple,  possi¬ 
bly  globally  optimal,  solutions3.  The  FAPP  of  Stage  [Boyan  and  Moore,  1998, 
Boyan  and  Moore,  2000]  represents  the  low-frequency  regularities  of  the  cost  func¬ 
tion.  High  frequencies  tend  to  be  smeared  by  construction:  to  each  search  trajectory 
the  best  value  of  that  trajectory  is  rendered  before  approximating  the  cost  function. 

One  might  ask  how  this  approach  works  on  NP-complete  problems.  It  has 
been  known  (see  e.g.,  [Gomes  et  al.,  1998,  Gomes  et  ah,  2000])  that  NP-complete 
searches  often  have  heavy  tails.  In  turn,  it  has  been  suggested  by  the  Cornell  group 
[Gomes  et  ah,  1998,  Gomes  et  ah,  2000]  that  randomized  search  (random  restarts) 
is  appropriate  for  this  family.  We  relate  this  observation  to  the  ‘No  Free  Lunch’ 
(NFL)  Theorem  of  optimization  [Wolpert,  and  Macready,  1997]  and  read  the  NFL 
Theorem  backwards:  If  we  have  an  upper  limit  for  the  duration  of  the  search  then 
it  is  better  to  make  use  of  the  NFL  Theorem  and  not  spend  too  much  time  with 
long  searches. 

Stage  has  an  interesting  property  from  this  point  of  view.  To  see  it,  consider  the 
experiments  depicted  in  Fig.  25.  The  figures  shows  30  independent  computations 
started  using  Stage  and  25  independent  computations  for  GA  with  Stagenis  (GAS). 
Slow  improvements  can  be  seen  in  the  figure  depicting  Stage  beyond  iteration  4  x 
104.  For  the  case  of  GAS  in  each  run  we  used  a  population  of  100  individuals.  In 


3For  other  types  of  approximations  to  search  problems,  see  [Hochbaum,  1995]. 
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Figure  25.  Random  restarts  and  the  effect  of  smart  restarts  on  parl6-l 

turn,  for  GAS,  Stagenis  did  not  make  any  effect  on  the  search  until  4x  105  evaluation 
steps.  Up  to  that  point  there  is  little  difference  between  the  different  WalkSats  used 
for  fitness  search,  not  much  improvement  can  be  seen  and  the  distribution  becomes 
narrow.  Up  to  this  point  the  case  corresponds  to  the  strategy  suggested  by  the 
Cornell  group  [Gomes  et  al.,  1998].  One  can  see  also  that  sweeping  of  the  length  of 
random  restarts  might  help,  indeed.  The  first  sudden  improvement  occurs  as  soon 
as  circa  200  evaluation  steps  in  one  of  the  runs.  The  last  one  occurs  at  around 
2  x  104.  Beyond  that  there  is  a  continuous  slow  improvement.  This  improvement 
could  be  a  function  of  parameter  delta  of  WalkSat,  and  could  be  tuned.  The  point, 
however,  is  that  around  1  x  105,  when  the  initialization  of  the  first,  population  is 
just  becomes  to  be  finished,  Stagenis  starts  to  make  an  effect4.  The  effect  is  strong, 
there  is  a  marked  improvement  in  performance  here.  Slow  improvement  can  be  seen 
from  generation  to  generation  either  by  the  effect  of  GA  or  the  effect  of  Stage  or 
both  (improvements  are  achieved  at  the  cross  signs). 

Let  us  compare  the  performance  of  Stage  with  that  of  GAS.  The  best  value 
of  GAS  is  at  about  15,  whereas  Stage  was  able  to  reach  5  in  one  of  the  runs. 

This  is  the  case  when  computations  are  made  on  a  serial  computer.  Also,  this 
comparison  is  not  perfect,  because  GAS  improves  barely  between  2x  104  and  5x  105, 
and  WalkSat  with  true  local  search  is  suboptimal  for  GAS:  all  WalkSats  could  be 
stopped  at  2  x  104.  Taking  this  1.5  order  of  magnitude  into  account  and  considering 
100  parallel  computers  the  results  of  Stage  3  x  103  need  to  be  compared  to  the 
final  results  of  GAS.  This  is  a  considerable  improvement  gained  in  this  the  parallel 
form  of  Stage. 

Let  us  consider  Stage  from  the  point  of  view  of  the  NFL  theorem  [Wolpert  and  Macready,  1997]. 
According  to  this  theorem,  if  we  know  nothing  about  the  cost  function  then  no  al¬ 
gorithm  is  better  than  the  random  search.  Stage  is  not  exception  to  the  NFL 

4 At  this  point  GA  can  not  make  an  effect  yet,  because  the  initial  generation  is  being  calculated. 
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theorem:  if  the  conditions  of  the  NFL  theorem  hold  then  there  is  non  reason  to 
choose  Stage  or  Stagenis  either.  The  NFL  theorem  does  not  hold,  for  example, 
under  the  following  related  conditions 

•  if  our  universe  is  of  low-complexity  [Schmidhuber,  1994], 

•  if  it  is  more  likely  to  encounter  simpler  problems  than  complicated  ones 
[Cover  and  Thomas,  1991],  and 

•  if  our  problem  is  entailed  by  the  Solomonoff-Levin  a  priori  distribution 
[Li  and  Vitanyi,  1997]. 

Under  these  conditions  Stage  has  the  following  properties.  Stage  ‘looks  for’ 
global  structure  given  the  neighboring  relations.  In  turn,  Stage  is  based  on  the 
assumption  that  local  minima  may  uncover  the  global  structure.  Stage  is  an  off¬ 
line  method:  individuals  created  by  Stage  in  Stagenis  can  be  computed  without 
interacting  with  the  system,  i.e.,  without  measuring  the  fitness.  That  is,  Stage  and 
other  parameterized  function  approximators  can  be  relatively  inexpensive  provided 
that  optimization  is  not  limited  by  computer  time  but  by  the  time  that  we  may 
interact  with  the  system.  One  may  say  that  Stage  is  harmless  if  the  NFL  theorem 
holds.  However,  Stage  may  cause  harm  if  the  NFL  theorem  does  not  hold,  because 
Stage  can  bias  our  search.  On  the  other  hand,  if  the  NFL  theorem  does  not 
hold  and  if  the  cost  (time)  of  interaction  is  relatively  large  (long)  compared  to  the 
cost  (time)  of  parametric  optimizations  then  Stage  -  or  other  FAPP  methods  - 
may  provide  considerable  improvements  over  random  search.  The  danger  of  biased 
search  can  be  avoided  by  keeping  a  constant  proportion  of  random  restarts. 

In  fact,  one  can  read  the  NFL  theorem  backwards:  If  we  have  encoded  all  of 
our  knowledge  into  the  search  problem  and  have  no  further  information  about 
it,  if  we  have  surplus  off-line  computer  power,  and  if  we  keep  our  search  un¬ 
biased  then  the  NFL  theorem  warrants  that  our  efforts  to  find  regularities  and 
to  make  use  of  those  regularities  during  search  will  cause  no  harm.  Stage  and 
GAS  make  use  of  such  parameterized  off-line  evaluations.  Off-line  evaluations  are 
closely  related  to  off-policy  methods,  discussed  extensively  by  Sutton  and  Barto 
[Sutton  and  Barto,  1998]  within  the  framework  of  value  estimation.  An  off-policy 
method  makes  use  of  off-line  computation  but  does  not  influence  the  search  proce¬ 
dure.  Further  theoretical  work  is  needed  here.  The  question  is  how  to  estimate  the 
value  of  off-policies  if  the  condition  of  the  NFL  Theorem  (about  the  infinity  of  the 
problem  space)  may  not  hold.  One  would  like  to  know  how  to  create  and  how  to 
select  among  off-policies  and  when  to  switch  on-  and  off-policies. 

We  showed  in  Sec.  2  that  the  FAPP  of  Stage  is  sometimes  unstable.  This  prob¬ 
lem  was  avoided  by  saving  the  ‘best,  so  far’  solutions  to  the  search  problem.  In 
the  case  of  GAS,  this  problem  is  taken  into  account  by  the  population  manage¬ 
ment.  There  is  another  advantage  of  the  GA  part  of  GAS.  GA  has  the  potential 
to  break  the  problem  into  components  given  its  implicit  parallelism.  In  turn,  GA 
has  the  potential  to  discover  ‘chunks’  (subgoals),  a  favorable  step  of  optimization 
[Solomonoff,  1986].  The  algorithm  could  be  improved  by  extending  this  implicit 
chunking  functions  via  off-line  search  for  chunks  (see,  e.g.,  [Schmidhuber,  1994]). 
For  low-complexity  problems  special  care  is  needed  to  avoid  biased  search,  e.g.,  the 
degradation  of  GA  population.  Random  restarts  and/or  mutations  can  be  used  for 
this  purpose. 

It  is  easy  to  see  that  GA  could  also  be  introduced  at  the  level  of  FAPPs.  It  is 
noteworthy  that  if  the  FAPPs  themselves  underwent  evolutionary  computing  then 
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the  best.  FAPPs  would  be  saved  and  the  best  regions,  (which  could  be  narrow7) 
would  be  more  exposed  to  evolutionary  explorations.  In  turn,  FAPPs  can  be  seen 
as  encoding  technique  for  GAs.  This  way  one  arrives  to  EANNs. 

6.1.  All  the  Fapps  trained  on  other  examples  perform  well  on  other  problems. 
Pre-trained  Fapps  can  be  used  to  find  rational  solutions  quickly.  Training  of  the 
Fapp  is  not  necessary  but  could  be  accomplished.  Care  is  needed,  however. 

One  should  note  that  in  some  cases  -  in  fact,  in  a  large  number  of  cases  -  Fapps 
of  one  problem  family  perform  better  on  another  problem  family. 

We  have  tried  to  learn  the  best  Fapps  by  using  a  winner-takes-all  algorithm. 
This  means,  that  we  have  started  all  available  Fapps  on  a  single  problem  and  after 
several  thousand  steps  the  best  Fapp  was  selected  and  was  trained  on  that  problem. 
It  w-as  expected  that  Fapps  will  partition  the  problem  space  and  we  shall  gain  the 
best  Fapps  for  each  subproblem.  This  experiment  did  not  succeed. 

The  two  facts,  i.e.,  that  Fapps  are  not  the  best  approximators  on  their  on  prob¬ 
lem  and  the  failure  of  this  WTA  experiment  pinpoints  to  the  possibility  that  the 
biased  search  of  Stage  plays  a  role  here.  Tuning  of  Stage  can  be  misled  by  im¬ 
provement  using  the  Fapp  but  then  the  local  search  may  start  in  the  neighborhood 
of  a  different  and  possibly  less  favorable  local  minimum.  This  seems  to  be  the  case 
here. 

One  may  solve  this  problem  by  enforcing  global  optimization  in  the  Fapp  space. 
This  global  optimization  should  not  be  based  on  tuning  of  the  Fapps,  w7hich  w7ould 
be  misleading.  As  it  appears  to  us,  a  better  method  is  to  use  a  fitness  value  for 
each  Fapp.  The  fitness  could  be  equal  to  the  negative  value  of  the  best  cost  t.ha 
Fapp  did  achieve  on  the  full  set  of  problem  families.  On  the  one  hand,  this  may 
require  tremendous  computations.  On  the  other  hand,  this  is  an  off-line  method 
and  results  that  become  available  during  the  different  optimization  procedures  can 
be  used  for  generating  novel  Fapps.  Moreover,  there  should  be  large  real-time  gain 
for  individual  problems,  using  rigid  Fapps. 

Again,  the  genetic  algorithm  could  be  of  use  here.  This  is  more  so,  because  the 
feature  space  of  the  Fapps  is  small  (it  was  five  in  our  case)  and  GA  can  be  very 
efficient  under  such  circumstances.  Moreover,  up  to  our  best  knowdedge,  this  is  the 
first  instance  when  gradient  methods  may  destroy  globally  optimal  solutions.  The 
unpleasant  consequence  of  the  clever  combination  of  local  search  methods  using 
smoothing  technique  (Boyan  and  Moore,  2000)  is  not  a  major  drawback,  it  can  be 
easily  fixed  by  adding  GA,  as  it  has  been  mentioned.  This  point  requires  further 
studies. 

Here  is  the  main  message  of  our  models.  This  message  is  supported  by  both 
parts  of  our  computational  studies;  (i)  the  GA  for  keeping  a  larger  population  of 
solutions  beyond  the  best  so  far  instance,  and  (ii)  the  development  of  Fapp  families. 

7.  Conclusions 

Experiences  with  Stage  and  GA  led  us  to  combine  these  algorithms  and  to 
create  GA  with  Stagenis  (GAS).  We  expected  that  GAS  can  fuse  the  robustness 
and  parallelism  of  GA  and  the  smartness  of  Stage.  In  our  tests,  we  compared  GAS 
with  Stage,  GAwRR  and  a  couple  of  other  heuristics  on  satisfiability  benchmark 
problems. 

We  made  the  following  experiences  that  are  in  accordance  with  the  literature: 
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•  The  plainGA  performed  poorly,  probably  due  to  unsuitable  state  encod¬ 
ing.  The  importance  of  state  encoding  is  a  well-known  drawback  of  genetic 
algorithms  (see,  e.g.,  e.g.  [Falkenauer,  1996])  and  we  have  neglected  this 
problem  for  the  sake  of  the  demonstration. 

•  Using  LS  to  compute  the  fitness  of  the  individuals  can  boost  the  perfor¬ 
mance  of  GA  (see  also  [Freisleben  and  Merz,  1997]). 

•  WalkSat  is  very  efficient  on  smaller  problem  instances. 

•  Stage  shows  impelling  performance  on  all  of  our  test  problems. 

Our  new  findings  are: 

•  Stage  performs  better  on  the  given  problem  domain  than  GA  (with  the 
given  state  encoding). 

•  Stagenis  can  boost  the  performance  of  GA. 

•  The  performance  of  GAS  is  comparable  to  that,  of  Stage.  In  most  cases 
GAS  is  somewhat  slower  when  executed  on  single  processor,  but  on  some 
problems  it  was  even  superior  to  Stage. 

•  We  proposed  a  parallel  algorithm,  the  GAS.  Therefore,  GAS  can  be  speeded 
up  if  enough  parallel  processors  are  available. 

•  Stage  is  an  off-line  method  and  may  boost  search  for  ‘nothing’  if  enough 
computational  power  is  available  and  if  either  search  time  or  interaction 
time  with  the  real  system  is  limited. 

Time- limited  optimization  may  be  desirable  in  cases  when  the  clauses  of  the  SAT 
problem  ‘are  not  equal’,  e.g.,  the  failure  to  satisfying  some  of  them  may  be  costly 
but  this  cost,  could  be  smaller  than  the  cost  not  to  offer  any  approximate  solution. 
Such  rational  choice  solutions  were  studied  and  it,  was  found  that,  reasonable  speed¬ 
up  can  be  gained. 

Finally,  we  note  that,  the  combined  algorithm  may  perform  well  on  problem 
families,  i.e. ,  on  a  group  of  problems,  provided  that,  a  small  change  of  the  problem 
(in  some  metric  of  the  problem  space)  gives  rise  to  a  small  change  of  the  solution 
(in  a  matric  of  the  solution  space).  In  case  if  such  metric  is  not  known,  or  in  case 
of  unrelated  problems  no  harm  will  be  made  by  the  combined  algorithm,  provided 
that  biased  search  (e.g.,  a  recurrence)  is  avoided.  If  this  is  not  the  case,  then  other 
methods  can  be  used  to  augment  the  optimization  and  to  overcome  the  biased 
search  of  Stage. 
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Table  2.  Results  of  the  mass  testing 


Problem 

type  of  LS 

STAGE 

GAS 

GAwRR 

WalkSat 

BSF 

#Evals 

BSF 

#Evals 

BSF 

#Evals 

BSF 

#Evals 

f  2000 

long  LS 

19 

512  223 

28 

585  039 

28 

585  039 

32 

350  373 

short  LS 

45 

714  975 

78 

540  667 

64 

392  234 

64 

725  426 

iil6d2 

long  LS 

0 

247  709 

32 

273  974 

32 

273  974 

70 

680  705 

short  LS 

0 

242  625 

11 

565567 

32 

486  755 

32 

796  073 

ii32al 

long  LS 

0 

11  046 

17 

30  091 

17 

30  091 

0 

84  402 

short  LS 

0 

43464 

0 

47  672 

0 

47  672 

0 

131  449 

ii32c3 

long  LS 

0 

1  404  703 

2 

1  560  916 

7 

1  734  502 

7 

2  323  008 

short  LS 

0 

96287 

1 

2  691  189 

7 

2  571  988 

7 

3  368  739 

ii8a4 

long  LS 

0 

1  229 

0 

243  828 

0 

243  828 

0 

82959 

short  LS 

0 

43981 

0 

223  092 

0 

103  595 

0 

6  742 

ii8b3 

long  LS 

1 

525  976 

1 

5310 

1 

5310 

1 

850  164 

short  LS 

1 

1  316  227 

1 

1  077  128 

1 

700  994 

1 

1  297  332 

parl6-2 

long  LS 

5 

1  694  021 

15 

1  608  343 

15 

1  214  563 

15 

1  928  988 

short  LS 

16 

1  893  826 

10 

1  961  045 

52 

1  112  929 

59 

2  011  803 

par32-l-c 

long  LS 

20 

849  085 

13 

140  793 

13 

140  793 

13 

1  000  547 

short  LS 

13 

1  187  205 

36 

2  289  301 

51 

702  845 

25 

1  203  244 

par32-3-c 

long  LS 

12 

1  165  261 

13 

521  962 

13 

521  962 

13 

1  253  889 

short  LS 

14 

1  065  066 

1 

2  691  189 

24 

685  952 

24 

1  209  921 

xx_ii32dl 

long  LS 

0 

68  169 

0 

1  413  775 

9 

3  471  353 

9 

3  786  561 

short  LS 

0 

30916 

0 

514  149 

9 

3  836  964 

9 

5  126  792 

BSF  =  Obj  of  best  state  found,  #Evals  =  number  of  evaluated  states 
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Appendix 


Algorithm  1  The  WalkSat  algorithm  in  pseudo-code 

PROCEDURE  doOneStep(clause) 

SET  Q={states  reachable  by  flipping  one 
variable  in  the  clause} 

SELECT  (one  of)  the  state(s)  with  the  best 
Obj  value  in  Q:  st 
IF  Obj(st)  <  Obj  (previous  _state) 

THEN  ACCEPT  the  step 
ELSE 

IF  Obj(st)  <  (Obj (previous_state)  +  DELTA) 

THEN  ACCEPT 

with  probability  (1  —  NOISE)  this  step 
with  probability  NOISE  a  random  step 
ELSE  remain  in  the  current  state 
FI 
FI 

END 

//main  loop 
REPEAT 

doOneStep (random  unsatisfied  clause) 

IF  Obj(st)  <  Obj  (best  _state_so_  far) 

THEN  best _state_so_ far  =  St 

FI 

UNTIL  (random  value  <  CUTOFF ) 

OR  (the  state  remained  unchanged 
for  PATIENCE  steps) 

OR  (<#Gens | #LS I #Evals>  exceeds 
<MaxGens | MaxLS I MaxEvals>) 

RETURN  best_state_so_far,  and  the  trajectory 
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Algorithm  2  The  Stage  algorithm  in  pseudo-code 

REPEAT 

RUN  LS  from  Xq 

//The  trajectory  is  (xo,  Xi, . . . ,  Sjv) 

SET  y  to  the  value  of  the  best  state 
in  the  trajectory 

RETRAIN  the  FAPP  with  the  pairs  ( Xi,y ) 

RUN  FLS  starting  from  x^ 

SET  Xq£W  to  the  end  state  of  the  FLS 

IF  x%ew  =  x0 

THEN  SET  Xo  to  a  random  starting  state 
ELSE  x0  =  XQew 
FI 

UNTIL  <#Gens | #LS | #Evals> 

exceeds  <MaxGens | MaxLS I MaxEvals> 
RETURN  the  best  state  found 
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Algorithm  3  The  GAwRR  algorithm  in  pseudo-code 

PROCEDURE  LS.  optimize  (a;) 

RETURN  (the  best  Obj  found  by  a  LS 
starting  from  x 

END 

INITIALIZE  old_population  with  random  states 
FOR  EACH  Xi  €  old_population 
Fitness(xi)  =  LS  .optimize(xi) 

REPEAT 

Step  1 :  SELECT  recombination_rate  ■  population_size 
pairs  ( Xi,Xj )  from  olcL_population 

with  selection  likelihood 
proportional  to  Fitness(-) 

FOR  EACH  pair  ( Xi,Xj ) 

SET  (x'^x'j)  =  recombinate(xi,Xj) 

INSERT  (x^xb)  into  new -population 
SET  Fitness{x'i )  =  LS.optim,ize{x'i) 

SET  Fitness(x'j)  =  LS .optimize(x'j) 

Step  2:  INSERT  mutationjrate  ■  population_size 

random  members  in  new -population 
FOR  EACH  new  member  Xi 

SET  Fitness{xi)  =  LS .optimize(xi) 

Step  3:  FILL_UP  the  rest  of  new -populations ith 
the  best  members  of  old -population 
according  to  their  fitness 
FOR  EACH  such  member 

retain  the  old  Fitness(-) 

UNTIL  <#Gens | #LS I #Evals>  exceeds 
<MaxGens | MaxLS I MaxEvals> 

RETURN  the  best  state  ever  found  during  LS  runs. 
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Algorithm  4  The  GAS  algorithm  in  pseudo-code 


PROCEDURE  LS.  optimized) 

RETURN  the  best  Obj  found  by  a  LS  starting  from  x 

END 

PROCEDURE  FLS. optimize (x) 

RETURN  the  best  state  found  by  a  FLS  starting  from  x 

END 


INITIALIZE  old_population  with  population_size  ■  (1  —  stagenis _rate)  random  Xi 
FOR  EACH  Xi  £  old -population  Fitness(xi)  =  LS.optimize(xi)  LOAD  trajectory  i 
FOR  i  =  1  TO  population_size  ■  stagenis_rate 
SET  Xi  =  FLS.optimize(randomstate) 

INSERT  Xi  in  old_population 
Fitness(xi)  =  LS.optimize(xi) 

LOAD  trajectory  i  into  FAPP 


REPEAT 

FOR  i  =  1  TO  population_size-  (stagenis  _r  ate  +  2  •  recombinations  ate) 

population_size- (stag  enis_rate-\-2-recombination_r  ate ) 


IF  i  =  0  mod  ([- 

THEN 


population _size- stagenis _r  ate 


*J) 


members 
into  FAPP 


Xi  =  FLS.optimize(randomstate) 
INSERT  Xi  in  new _population 
Fitness(xi )  =  LS.optimize(xi ) 

LOAD  trajectory  i  into  FAPP 
ELSE 


SELECT  ( Xi,Xj )  of  old_popidation  with 

selection  likelihood  proportional  to  Fitness(-) 
SET  (x'i,x'j)  =  recombinate(xi,Xj) 

INSERT  (x'i,Xj)  in  new _population 
Fitness(x'i )  =  LS  .optimize(x'i) 

Fitness(x'j)  =  LS .optimize(xb) 

LOAD  both  trajectories  into  FAPP 

INCREMENT  i  //recomb,  generates  2  new  members! 

FI 


FILL_UP  the  rest  of  new _population 

with  the  best  members  of  old_population 
according  to  their  fitness 
FOR  EACH  such  member  retain  the  old  Fitness(-) 

UNTIL  <#Gens | #LS | #Evals>  exceeds  <MaxGens | MaxLS | MaxEvals> 
RETURN  the  best  state  ever  found  during  LS  runs. 
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Abstract.  The  first  task  is  to  discover  and  characterize  the  substrate  of 
distributed  computation,  which  is  the  Internet  in  our  case.  The  second 
task  is  to  develop  methods  (models)  for  collaborative  interactions. 

According  to  recent  discoveries,  the  Internet  has  a  special  structure, 
called  scale-free  small  world.  Up  to  some  extent,  the  structure  of  Internet 
is  described  by  different  models.  In  a  parallel  project  we  have  introduced 
a  novel  model,  the  HebbNet,  which  represent  competitive-collaborative 
activities  in  a  more  sensible  way  than  existing  other  models  of  the  Inter¬ 
net.  The  particular  properties  of  the  Internet  and  HebbNets  are  described 
here. 

Second,  we  have  downloaded  a  portion  of  the  Internet.  The  work¬ 
ing  and  the  gain  of  efficiency  using  competitive-collaborative  model  were 
studied  on  this  downloaded  part.  A  model  for  injecting  novel  information 
into  the  downloaded  portion  of  the  Internet  was  developed.  Computer 
simulation  demonstrate  the  idea:  (i)  competition  gives  rise  to  a  division 
of  work,  the  division  of  searched  portions  of  the  reinforced  robotic  In¬ 
ternet  agents,  (ii)  the  division  of  work  gives  rise  to  specialization  and 
increased  efficiency  for  the  specialized  robotic  agents. 

The  conclusion  section  (Section  5)  contains  information  about 
necessary  future  steps,  which  are  not  covered  by  the  present 
project  and  are  being  at  basic  research  phase  at  this  very  mo¬ 
ment. 

The  Report  is  extended  by  Appendices: 

(1)  Description  of  concepts  and  methods,  like  reinforcement  learning, 
text  classification,  SVMs  are  appended  to  the  Report  are  provided 
in  the  Appendix 

(2)  A  large  body  of  experiments  on  HebbNets  are  provided  in  an  ac¬ 
companying  technical  report,  titled:  Meta  level  analysis  of  Hebbian 

evolving  networks  http : //people . inf . elte . hu/ lor incz/Files/NIPG-ELU-2 1-04 -2002 .pdf 


Date :  6th  June  2002. 


1 


2  PRINCIPAL  INVESTIGATOR:  DR.  HABIL.  ANDRAS  LORINCZ 

Author:  Andras  Lorincz 


Contents 

List  of  Figures  3 

1 .  Introduction  6 

Report  organization  6 

2.  Internet  and  other  networks  6 

2.1.  Short  overview  6 

2.2.  Introduction  7 

2.3.  Description  of  HebbNet.  8 

2.4.  Results  10 

3.  Competition,  which  Results  in  Collaboration  13 

3.1.  Consequences  of  Novel  Findings  about  the  Internet  13 

3.2.  Self-Organizing  System  (SOS)  for  Online  Collaboration  16 

4.  Discussion  18 

5.  Conclusions  21 

Appendix  22 

6.  Crawlers  22 

6.1.  Introduction  22 

6.2.  Methods  23 

6.3.  Features  and  learning  to  search  26 

6.4.  Breadth  first  crawler  26 

6.5.  Results  and  discussion  28 

6.6.  Conclusions  34 

References  35 


EOARD-NIPG-ELU-03-  JUNE-2002 


3 


List  of  Figures 

1  The  symmetric  kernel  function 

Dependence  of  temporal  kernel  on  time  difference  tt  —  tj  between  spike 
sent  by  neuron  j  to  neuron  i  and  spike  emitted  by  neuron  i.  Kernel 
parameters  used  in  the  computer  studies:  (i)  ratio  of  the  length  of 
weakening  and  of  strengthening  regions  ( tl  =  j=)  and  (ii)  ratio  of 
the  amplitudes  (ta  =  ^=-).  During  our  simulations  tl  was  set  to  1. 
whereas  ta  was  0.5.  For  results  on  other  parameters,  see  the  technical 
report  [Palot.ai  et.  al.,  2002].  9 

2  Connection  matrices  (W)  for  100  neurons 

A:  Connection  matrix  for  Region  1.  rex,  the  ratio  of  externally  excited 
neurons  is  30%.  r/,  the  percent  of  neurons  allowed  to  fire  is  30%.  B: 
Connection  matrix  for  Region  2.  rex  =  ry  =  70%).  The  matrixes  are 
normalized  and  the  darker  colors  correspond  to  higher  values  (black:  1, 
white:  0).  10 

3  Harmonic  mean  distances 

This  figure  displays  the  local  harmonic  mean  distances  in  ascending 
order  for  both  regions.  For  better  visualization  not  all  data  points 
are  marked  and  the  point  are  connected  with  a  solid  line.  Lines  with 
diamonds  belong  to  the  network  that  was  developed  using  STDP 
learning.  Lines  with  circles  belong  to  networks  with  the  same  but 
randomly  redistributed  weights.  Line  with  empty  (solid)  markers 
represent  a  HebbNet  of  Region  1  (Region  2).  In  region  2  the  emerging 
HebbNet  has  much  smaller  local  harmonic  distances  as  compared  to 
the  corresponding  random  net.  Global  harmonic  mean  distance  for  the 
original  and  for  the  randomized  networks  in  Region  1  (Region  2)  are 
Dh  =  5.25  and  D £  =  5.01  ( Dh  =  6.28  and  D'h  =  4.98),  respectively.  11 

4  Cumulative  weighted  degree  distribution 

Cumulative  distribution  of  the  sum  of  the  weights  of  outgoing 
connections  for  the  network  ( rex  =  Tf  =  70,  N  =  100,  the  number  of 
histogram  bins  (discretization)  is  d  =  30).  k*  denotes  the  discretized 
weighted  connection  number.  The  figure  shows  scale-free  properties 
(power-law  distribution  P(k*)  «  k*~'ye~k */£  with  7  «  1.61  and  £  «  20) 
in  a  relatively  broad  region.  12 

5  Average  local  distance  vs.  excitation  ratio 

These  two  figures  demonstrate  the  systems  robustness  to  the  magnitude 
of  the  excitation. 

A  rA  =  0.5,  Tl  =  0.67,  B  rm  =  0.5,  r l  =  1.5.  Diamonds:  average  local 
distances  for  the  evolving  network.  Circles:  average  local  distances  for 
the  corresponding  random  net.  12 

6  Hostess-crawler  system 

Internet  is  explored  by  crawlers  (which  download  and  examine 
documents),  whereas  communication  with  other  entities  is  governed 
by  the  hostess,  which  can  launch  and  modify  crawlers  based  on 
reinforcements.  16 


4  PRINCIPAL  INVESTIGATOR:  DR.  HABIL.  ANDRAS  LORINCZ 

7  Algorithm  for  forming  a  weblog 

The  algorithm  has  three  important  parameters:  (i ) ‘memory’:  size 
of  environment  around  a  node  of  the  network,  examined  during  the 
evaluation  of  that,  node,  (ii)  ‘ wlsize size  of  the  weblog  (number  of 
links  that  can  be  remembered),  and  (iii)  ‘7’:  discount  factor  of  past 
rewards  received  around  a  node.  17 

8  Two  competing  crawlers  using  weblogs  in  a  "small- world" 

The  figure  depicts  the  connectivity  table  (representing  the  links) 
between  nodes.  Typical  scale-free  small  world  connectivity  can  be  seen 
here:  some  nodes  have  many  links  pointing  onto  them  (rows),  whereas 
others  have  a  long  list  of  links  pointing  to  other  nodes  (columns). 

Green  (red)  dots  depict  the  links  used  by  the  first,  (second)  crawler. 
Efficient,  self-organizing  division  of  the  world,  i.e.,  the  task  space,  can 
be  seen.  Weblogs  contained  10  links.  18 

9  Division  of  a  substructre  of  the  Internet  between  two-crawlers 

The  gray  region  belongs  to  crawler  A.  Both  crawlers  maintain  a  list  of 
weblogs,  which  are  efficient,  restart,  links  to  continue  search  when  novel 
information  in  the  actual  neighborhood  has  been  collected.  Weblogs 
are  ordered  by  their  values  and  change  dynamically  amongst  crawlers 
during  competition.  19 

10  Decrease  of  age  of  found  novel  documents 

New  documents  appear,  whereas  old  documents  disappear  in  this 
experiment.  Competition  settles  around  "time"  600.  The  overlap 
amongst  links  searched  by  crawlers  becomes  minimized  by  this  time. 

Both  crawlers  have  high  connectivity  weblogs.  The  age  of  found  novel 
documents  is  decreasing  afterwards.  20 

11  Increase  of  access  to  novel  documents. 

New  documents  appear,  whereas  old  documents  disappear  in  this 
experiment.  Competition  settles  around  "time"  600.  The  overlap 
amongst  links  searched  by  crawlers  becomes  minimized  by  this  time. 

Both  crawlers  have  high  connectivity  weblogs.  The  efficiency  of  found 
novel  documents  is  increasing  afterwards.  20 

12  Context  of  the  document 

Document,  and  its  first  and  second  ‘neighbors’.  26 

13  SVM  based  document  classifiers  (A)  Classification  of  distance 
from  document  using  SVM  classifiers.  The  CFG  method  maintains  a 
list  of  visited  links  ordered  according  to  the  SVM  classification.  One 
of  the  links  belonging  to  the  best  non-empty  classifier  is  visited  next.. 
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1.  Introduction 

Cooperation  requires  some  knowledge  about  the  ‘arena’  of  cooperation.  In 
our  case,  this  arena  is  the  Internet.  Interestingly,  recent  discoveries  demon¬ 
strated  that  a  large  variety  of  networks,  such  as  social  networks,  scientific 
collaborative  networks,  hardware  networks  at  the  level  of  proxies,  Internet  con¬ 
nections,  html  links,  etc.,  share  the  same  structure.  Moreover,  the  suspicion 
has  been  increasing  that  all  (or  at  least  a  large  number  of)  ‘sustained’  systems 
share  the  same  common  connectivity  property,  provided  that  the  underlying 
concept  of  connectivity  is  discovered.  Two  concepts  have  to  be  explained  here: 

•  the  concept  of  a  sustained  system  and 

•  the  word  connectivity. 

A  system  is  called  sustained  (i)  if  it  is  not  closed  from  the  environment  and  (ii) 
if  it  is  receiving  energy  from  the  environment.  A  cell  of  the  body,  the  earth  of 
the  solar  system,  the  research  group  of  the  University  are  such  examples. 

Connectivity  is  a  loosely  defined  word  to  describe  interaction  amongst 
collaborative-competitive  units.  It  is  a  loose  term,  because  in  its  crudest  form 
is  not  concerned  with  the  vehemence,  but  only  with  the  existence  of  the  inter¬ 
action.  The  usage  of  this  word  is  as  follows:  If  two  units  are  engaged  in  kind 
of  interaction  then  they  are  connected.  Interaction  can  be  directed  or  could  be 
mutual.  For  example,  (mostly)  directed  interaction  occurs  between  the  wind 
and  the  sailboat.  Interactions  are  never  fully  directed;  the  sailboat  -  although 
up  to  a  minor  extent,  but  -  influences  the  local  properties  of  the  wind.  An  ex¬ 
ample  for  a  more  balanced  interaction  is  the  teacher  student  relation,  whereas 
an  almost  fully  balanced  may  occur  for  some  ants.  Given  the  type(s)  of  interac¬ 
tion,  one  ends  up  with  a  directed  or  with  an  undirected  network  of  interacting 
units. 

Report  organization. 

(1)  First,  novel  concepts  of  the  Internet  are  provided.  This  part  of  the 
report  contains  the  description  of  our  model  of  the  Internet,  which  we 
call  HebbNets. 

(2)  The  second  part  contains  the  results  of  our  competitive-collaborative 
model.  Specialization  of  Internet  agents,  division  of  work  and  the  in¬ 
crease  of  efficiency  are  demonstrated  on  a  downloaded  part  of  the  In¬ 
ternet. 

(3)  Discussions  can  be  found  and  conclusions  are  drawn  in  Sections  4  and 
5,  respectively. 

(4)  Details  about  intelligent  crawlers  are  contained  by  the  Appendix. 

(5)  A  large  body  of  experiments  on  HebbNets  are  provided  in  an  accompa¬ 
nying  technical  report,  titled:  Meta  level  analysis  of  Hebbian  evolving 

networks  http: //people . inf .elte . hu/lor incz/Files/NIPG-ELU-2 1-04-2002 .pdf 

2.  Internet  and  other  networks 

2.1.  Short  overview.  Interactive  systems  with  network  structure  have  re¬ 
cently  become  a  fascinating  area  of  research  interest.  Dynamic  systems  that 
govern  the  formation  of  these  networks  are  of  central  importance  because  of 
the  intriguing  similarities  among  many  biological,  social,  and  information  pro¬ 
cessing  networks.  The  original  model  of  Watts  and  St.rogatz  [D.  J.  Watts  and 
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S.  H.  Strogatz,  Nature  393,  440,  (1998)]  explore  random  restructuring  of  links 
among  a  finite  number  of  ‘nodes’.  Other  works  have  dealt  with  growing  struc¬ 
tures  and  optimization  of  link  structure  of  finite  systems.  In  this  paper  we 
present  and  study  the  ‘HebbNet.s’,  networks  in  which  structural  changes  are 
governed  by  Hebbian  learning  rules.  We  find  that,  Hebbian  learning  is  also 
capable  to  develop  all  kinds  of  network  structures,  including  small- world  and 
scale-free  networks.  Our  results  may  support  the  idea  of  Edelman  [G.  M.  Edel- 
man:  Neural  Darwinism,  New  York,  Basic  Book  (1987)]  that  the  development 
of  central  nervous  system  may  have  evolutionary  components. 


2.2.  Introduction.  The  last  few  years  have  witnessed  the  evolution  of  novel 
and  efficient  ways  of  describing  complex  interactive  systems  (CISs).  The  novel 
description  of  a  CIS  is  based  on  graphs  with  nodes  and  (directed)  edges,  rep¬ 
resenting  constituents  of  the  system  and  the  interactions  among  them.  Classi¬ 
fication  of  CISs  is  based  on  the  statistical  properties  of  the  network.  Similar 
network  structures  may  be  found  in  many  different  fields.  Both  these  sys¬ 
tems  and  the  corresponding  dynamical  models  which  define  the  formation  of 
these  networks  may  be  of  fundamental  importance  to  understand  the  behaviors 
of  CISs.  The  interest  in  CISs  is  boosted  by  the  intriguing  similarities  of  bio¬ 
logical,  social,  and  information  processing  networks  [Watts  and  St.rogatz,  1998, 
Kleinberg,  1998,  Albert  et,  ah,  1999,  Barabasi  and  Albert,  1999,  Barabasi  et  ah,  2000, 
Marchioria  and  Latorac,  2000,  Latora  and  Marchiori,  2001,  Bohland  and  Minai,  2001, 
i  Cancho  and  Sole,  2001,  Albert  and  Barabasi,  2002].  The  original  model  of 
the  the  World  Wide  Web  (WWW)  by  Watts  and  Strogatz  [Watts  and  Strogatz,  1998] 
explored  random  restructuring  of  the  links  among  a  finite  number  of  ‘nodes’. 
Barabasi  and  his  colleagues  introduced  the  concept  of  preferential  attach¬ 
ment  to  provide  a  more  sophisticated  model  of  WWW  [Albert  et  ah,  1999, 
Barabasi  and  Albert,  1999].  The  idea  has  been  extended  to  other  types  of  net¬ 
works  [Barabasi  et  ah,  2000].  Another  approach  deals  with  optimization  of  link 
structure  of  finite  systems  [i  Cancho  and  Sole,  2001]. 

Probably  the  most  complex  network  is  inside  us:  the  most  exciting  proper¬ 
ties  of  our  brain  have  a  lot  to  do  with  the  special  connection  system  among 
its  units.  Although  our  knowledge  on  the  building  blocks  is  increasing,  we  are 
still  far  from  a  complete  understanding  of  the  structure  and  research  of  the 
central  nervous  system  (CNS)  is  primarily  targeting  at  these  questions.  At  the 
same  time,  many  similar  questions  on  the  architecture  of  the  WWW  arise.  The 
recognition  of  the  parallel  nature  resulted  in  the  intuitive  idea  to  highlight  the 
similarity  between  these  descent  phenomena.  From  one  hand,  it  has  been  sug¬ 
gested  to  use  the  mutual  activity  correlation  (that  is  the  original  form  of  Heb¬ 
bian  learning)  in  modeling  organizational  learning  [Kulkarni  et  ah,  2000].  On 
the  other  hand,  similar  structural  characteristics  of  the  web  and  the  single  com¬ 
pletely  described  nervous  system  of  the  nematode  C.  elegans  has  been  reported 
[Watts  and  Strogatz,  1998].  In  this  paper  we  investigate  the  question  whether 
an  evolving  network,  governed  by  Hebbian  rule,  has  the  same  or  similar  proper¬ 
ties  as  found  by  studying  the  web  or  social  networks.  The  question  is  of  central 
importance:  in  seeking  the  answer  we  hope  to  find  general  underlying  princi¬ 
ples,  which  give  rise  to  small- world  like  structures  in  cooperative-competitive 
systems. 
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It  may  be  important  to  note  here  that  concepts  of  Hebbian  learning  have  un¬ 
dergone  revolutionary  changes  in  the  last  few  years.  The  original  suggestion  of 
Hebb  [Hebb,  1943]  has  been  modified  by  recent  findings  [Markram  et  ah,  1997, 
Magee  and  Johnston,  1997,  Bell  et  ah,  1997].  (For  a  review,  see,  e.g.  the  work 
of  Abbott  and  Nelson  [Abbott  and  Nelson,  2000]).  The  novel  concept  is  called 
spike-time  dependent  synaptic  plasticity  (STDP).  The  underlying  mathemat¬ 
ical  concepts  could  correspond  to  linear  and  non-linear  versions  of  principal 
component  analysis  [Oja,  1982,  Abbott  and  Nelson,  2000]. 

2.3.  Description  of  HebbNet.  We  intended  to  construct  a  model  network 
(HebbNet  in  short)  in  which  the  structural  changes  are  governed  by  Hebbian 
rules  and  the  interaction  with  the  environment  and  all  the  interacting  elements 
of  the  network  are  as  simple  as  possible.  We  assume  that,  the  inputs  to  the 
network  have  no  spatio-temporal  structure,  i.e.,  the  input  will  be  randomly 
generated.  In  turn,  search  for  either  spatial  or  temporal  correlations  is  mean¬ 
ingless.  In  contrast  to  the  web  (WWW),  the  input  does  not  depend  on  the 
actual  network  structure.  Results  on  temporally  symmetric  STDP  (Fig.  1) 
are  reported  here.  Quantitatively  similar  results  can  be  derived  for  temporally 
asymmetric  STDP.  Results  are  thoroughly  detailed  in  our  technical  report  (TR) 
[Palot.ai  et  al.,  2002]. 

Our  models  consist  of  N  number  of  simplified  integrate-and-fire  like  ‘neurons’ 
or  nodes.  The  dynamics  of  the  internal  activity  is  written  as 

(1)  ^  utijCij  +  xfxt\ 

j&s 

for  i  =  1,2,..., At.  Here  a;(ext)  g  (0, 1)W  denotes  the  randomly  generated 
input  from  the  environment,  Oj  is  the  internal  activity  of  neuron  i,  Wij  is  the 
connection  strength  from  neuron  j  to  neuron  i.  S  is  the  set  of  nodes  with 
nonzero  connection  to  node  i.  a?  is  equal  to  1  if  the  jth  neuron  fires  (the 
neuron  outputs  a  spike)  and  superscript  s  stands  for  ’spiking’,  *  a*  is  the 
excitation  received  by  neuron  i  from  neuron  j  when  neuron  j  fires.  After  firing 
(a*  =  1)  a.j  is  set  to  zero  immediately.  Equation  1  describes  the  simplest  form 
of  ‘integrate  and  fire'  network  models. 

Instead  of  applying  a  given  threshold  {),  we  simply  sorted  out  rj  percent  of 
nodes  with  the  highest  activity.  This  choice,  which  requires  global  information 
about  network  properties  is  only  a  matter  of  convenience:  quantity  ratios  are 
used  throughout  the  paper.  Using  threshold  or  treating  internal  activity  as  fir¬ 
ing  probability  have  yielded  qualitatively  the  same  results  [Palotai  et  al.,  2002]. 
The  latter  methods  make  use  of  local  (instead  of  global)  properties.  Synaptic 
strengths  were  modified  as  follow's : 

(2)  =  ^2  K(tj 

where  K  is  a  symmetric  kernel  function  which  defines  the  influence  of  the  tem¬ 
poral  activity  correlation  on  synaptic  efficacy.  The  form  of  our  chosen  kernel 
is  shown  in  Fig.  1.  The  kernel  is  a  function  of  the  time  differences  and  is 
independent  of  time  shifts.  Given  the  large  variety  of  neuronal  connectivity 
and  interaction  types,  we  extend  strict  neurobiological  modelling,  where  ex¬ 
perimental  observations  are  directly  applied  in  choosing  the  kernel  function 


EOARD-NIPG-ELU-03-  JUNE-2002 


9 


i 

+  Lv 

k  Amplitude 
(K(A)) 

A 

A+ 

V 

— .  w 

1 

L 

A 

1 

r 

Time  difference 

J  (A) 

FIGURE  1.  The  symmetric  kernel  function 

Dependence  of  temporal  kernel  on  time  difference  ti  —  tj  be¬ 
tween  spike  sent  by  neuron  j  to  neuron  i  and  spike  emitted  by 
neuron  i.  Kernel  parameters  used  in  the  computer  studies:  (i) 
ratio  of  the  length  of  weakening  and  of  strengthening  regions 
(tl  =  j=)  and  (ii)  ratio  of  the  amplitudes  {ta  =  jt=)-  Dur¬ 
ing  our  simulations  tl  was  set  to  1,  whereas  ta  was  0.5.  For 
results  on  other  parameters,  see  the  technical  report 
[Palotai  et  al.,  2002]. 


(e.g.  as  findings  in  [Zhang  et  ah,  1998]  have  been  used  in  [Levy  et  ah,  2001]). 
For  the  sake  of  generality,  broader  ranges  of  parameters  are  studied.  However, 
it  is  worth  mentioning  that  such  symmetric  kernels  do  exist  in  real  neuronal 
networks  [Abbott  and  Nelson,  2000]. 

In  the  first  place,  we  have  been  interested  in  the  emerging  local  and  global 
connectivity  structure  of  W.  It  has  been  debated  whether  the  global  structure 
property  (L,  the  average  number  of  edges  on  the  shortest  path)  and  the  cluster¬ 
ing  coefficient  ( C )  proposed  by  Watts  and  St.rogatz  [Watts  and  St.rogatz,  1998] 
are  appropriate  for  describing  weighted  networks  [Latora  and  Marchiori,  2001]. 
We  have  applied  both  the  Watts  and  St.rogatz  measures  and  the  connectivity 
length  ( Dh )  measure  suggested  in  [Marchioria  and  Latorac,  2000],  and  similar 
results  have  been  obtained  [Palotai  et  ah,  2002].  Due  to  its  simplicity  and  logic 
we  have  chosen  the  latter  to  present  our  results. 

The  physical  distance  between  two  nodes  is  considered  as  the  inverse  of 
their  connection  weight.  One  can  also  talk  about  distances  of  (indirect)  paths 
between  the  same  two  nodes.  The  shortest  distance  on  the  graph  dij  is  defined 
by  the  lower  bound  of  all  possible  path  length  in  the  graph  from  node  j  to 
node  i.  Consider  that  path  length  —  apart  from  a  constant  scaling  factor  — 
is  inversely  proportional  to  delivery  time ,  or  rate  of  information  transfer.  The 
method  of  connectivity  length[Marchioria  and  Latorac,  2000]  makes  use  of  the 
efficiency  of  information  transfer  instead  of  direct  physical  distances  between 
nodes:  The  local  harmonic  mean  distance  for  node  i  is 
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where  is  the  number  of  neurons  around  neuron  i  with  Wij  >  0.  The  mean 
global  distance  in  the  network  can  be  characterized  by  the  following  quantity: 


(4) 


Dh 


N(N  -  1) 

V  x- 

*-‘‘>■,3  dij 


Global  distance  provides  a  measure  for  the  size  (or  the  diameter)  of  the  net¬ 
work.  According  to  [Marchioria  and  Latorac,  2000,  Bohland  and  Minai,  2001] 
local  harmonic  mean  distance  measure  may  correspond  to  1/C  (inverse  of  the 
clustering  coefficient),  whereas  the  global  value  corresponds  to  L. 


2.4.  Results.  In  our  simulations  the  network  parameters  were  systematically 
varied  [Palotai  et  al.,  2002].  The  emerging  structures  were  most  sensitive  to  the 
following  parameters:  (i)  the  magnitude  of  the  external  excitation  (defined  by 
the  average  percentage  of  neurons  receiving  excitation  from  the  environment, 
rex)  and  (ii)  the  ratio  of  firing  neurons  (r/).  According  to  the  results,  for 
N  =  100,  two  essentially  different  parameter  regions  may  be  separated:  Region 
1:  the  values  of  both  rex  and  rf  parameters  are  under  0.5.  Region  2:  the 
values  of  both  rex  and  r/  are  above  0.5.  The  following  figures  show  some 
typical  results. 


(A)  (B) 

FIGURE  2.  Connection  matrices  (W)  for  100  neurons 

A:  Connection  matrix  for  Region  1.  rex ,  the  ratio  of  externally 
excited  neurons  is  30%.  r/,  the  percent  of  neurons  allowed 
to  fire  is  30%.  B:  Connection  matrix  for  Region  2.  rex  = 
rf  =  70%.  The  matrixes  are  normalized  and  the  darker  colors 
correspond  to  higher  values  (black:  1,  white:  0). 

While  Fig.  2(A)  resembles  a  random  connection  matrix,  Fig.  2(B)  represents 
a  sparse,  yet  almost  fully  connected  structure.  Only  4  out  of  100  neurons 
became  isolated,  while  the  number  of  connections  was  1796  out  of  the  possible 
10, 000.  After  normalization  (by  setting  the  maximum  component  of  matrix 
W  to  1),  the  average  weight  was  about  0.24.  Similar  results  were  obtained  in 
a  broader  parameter  range. 

The  following  figure  is  to  characterize  the  size  of  the  network.  We  compared 
the  resulting  HebbNet  structures  with  a  random  net,  in  which  the  same  weights 
of  the  dynamic  network  have  been  randomly  assigned  to  different  node  pairs. 
Fig.  3  highlights  clearly  the  emerging  small-world  properties  (i.e.,  high  clus¬ 
tering  coefficients  and  short  path  lengths).  Although  the  global  connectivity 
length  was  almost  the  same  for  all  the  HebbNets  and  their  corresponding  ran¬ 
dom  nets,  local  distances  are  much  smaller  in  Region  2.  That  is,  connectivity 
structure  is  sparse  but  information  flow  is  still  efficient.  The  Watts-Strogatz 
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Neurons  ordered  by  D(  magnitude 

FIGURE  3.  Harmonic  mean  distances 

This  figure  displays  the  local  harmonic  mean  distances  in  as¬ 
cending  order  for  both  regions.  For  better  visualization  not 
all  data  points  are  marked  and  the  point  are  connected  with 
a  solid  line.  Lines  with  diamonds  belong  to  the  network  that 
was  developed  using  STDP  learning.  Lines  with  circles  be¬ 
long  to  networks  with  the  same  but  randomly  redistributed 
weights.  Line  with  empty  (solid)  markers  represent  a  Hebb- 
Net,  of  Region  1  (Region  2).  In  region  2  the  emerging  HebbNet, 
has  much  smaller  local  harmonic  distances  as  compared  to  the 
corresponding  random  net.  Global  harmonic  mean  distance 
for  the  original  and  for  the  randomized  networks  in  Region  1 
(Region  2)  are  Dh  =  5.25  and  D rh  =  5.01  ( Dh  =  6.28  and 
=  4.98),  respectively. 

(WS)  rewiring  model  [Watts  and  Strogatz,  1998]  is  known  to  yield  both  ran¬ 
dom  and  small-world  structures.  For  N  =  100  neurons  with  6  connections 
of  0.5  strength  to  each  and  with  rewiring  probability  p  around  0.9,  the  struc¬ 
ture  obtained  by  the  WS  model  is  in  good  agreement  with  our  random  net. 
Our  small-world  net,  on  the  other  hand,  can  be  characterized  with  rewiring 
probability  between  0.1  and  0.2. 

Limited  scale-free  properties  seem  to  emerge  for  such  small  networks  in  Re¬ 
gion  2  as  shown  by  Fig.  4.  The  robustness  of  the  network  to  the  magnitude 
of  the  excitation  is  illustrated  by  the  last  figure.  By  increasing  the  excitation 
level,  the  efficiency  of  the  random  net  is  drastically  decreasing  down  to  one 
third  of  its  original  level,  whereas  the  efficiency  of  the  small-world  network 
does  not  change  too  much  in  the  same  region.  However,  for  the  network  with 
parameters  ta  =  0.5,  tl  =  0.67  (Fig.  5(A)),  there  is  a  sharp  cut-off  around 
50%,  where  local  distances  suddenly  drop  in  both  networks,  due  to  the  high 
ratio  of  excitations.  Qualitatively  similar  behavior  can  be  found  for  network 
parameters  ta  =  0.5,  tl  =  1-5  (Fig.  5(B)),  but  the  cut-off  is  around  90%).  De¬ 
tailed  results  belonging  to  the  parameter  range  given  in  Table  1  can  be  found 
in  our  technical  report  [Palotai  et  al.,  2002]. 

In  summary,  we  have  demonstrated  that  small-world  networks  with  scale- 
free  domains  may  emerge  under  STDP  Hebbian  learning  rule.  The  exis¬ 
tence  of  such  ‘HebbNets’  may  support  the  speculative  view  of  Kandel  et  al. 
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Figure  4.  Cumulative  weighted  degree  distribution 

Cumulative  distribution  of  the  sum  of  the  weights  of  outgoing 
connections  for  the  network  ( rex  =  rj  =  70,  N  =  100,  the 
number  of  histogram  bins  (discretization)  is  d  =  30).  k*  de¬ 
notes  the  discretized  weighted  connection  number.  The  figure 
shows  scale-free  properties  (power-law  distribution  P(k*)  « 
/c*-7e-fc* with  7  «  1.61  and  £  «  20)  in  a  relatively  broad 
region. 


Ratio  of  excitation  Ratio  of  excitation 


FIGURE  5.  Average  local  distance  vs.  excitation  ratio 

These  two  figures  demonstrate  the  systems  robustness  to  the 
magnitude  of  the  excitation. 

A  ta  =  0.5,  tl  =  0.67,  B  r a  =  0.5,  tl  =  1.5.  Diamonds:  aver¬ 
age  local  distances  for  the  evolving  network.  Circles:  average 
local  distances  for  the  corresponding  random  net. 


[Kandel  and  O’Dell,  1992]  that  structural  development  and  learning  plasticity 
in  CNS  may  have  a  common  basis.  Furthermore,  our  results  support  the  orig¬ 
inal  suggestion  of  Edelman  that  learning  in  the  CNS  may  have  evolutionary 
components  [Edelman,  1987].  Present  models  on  small- world  networks  may 
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Parameters 

tl 

1/2 

2/3 

1 

3/2 

2 

ta 

1 

1/2 

1/4 

1/8 

1/16 

r 

0.01  -  0 

.99 

9 

0.1  -  0.9 

Table  1.  Parameter  ranges  for  random  inputs, 
symmetric  kernel  function  and  100  neurons 

[Palot.ai  et,  al.,  2002] 

tl-  ratio  between  strengthening  and  weakening  time  intervals 
of  the  kernel,  Ta-  ratio  of  between  amplitudes  (see  Fig.  1). 
r:  percent  of  the  neurons  with  external  input.  9:  percent  of 
neurons  allowed  to  fire  (excitability). 


allow7  for  the  following.  At  early  stages  of  development,  the  evolutionary  com¬ 
ponent  may  be  guided  by  preferential  attachment  like  mechanisms.  At  later 
stages,  this  component  may  be  maintained  by  noise  randomly  generated  within 
the  CNS. 

As  far  as  other  evolving  networks  are  considered,  the  profound  implication 
of  our  result  is  that  local  (Hebbian)  learning  rules  may  be  sufficient  to  form 
and  maintain  an  efficient,  network  in  terms  of  information  flow.  This  feature 
differs  from  existing  models,  such  as  the  model  on  preferential  attachment 
(which  is  equivalent  to  an  increasing  number  of  non-interacting  surfing  agents 
with  constant  trapping  rate)  [Barabasi  and  Albert,  1999],  the  global  optimiza¬ 
tion  scheme  [i  Cancho  and  Sole,  2001],  and  also  from  the  original  Watts  and 
Strogatz  model  [Watts  and  Strogat.z,  1998]. 

3.  Competition,  which  Results  in  Collaboration 

Rapid  development  of  Internet  technologies  increases  the  use  of  this  unique 
medium  for  collaboration.  While  interoperability  is  a  main  focus  of  these  col¬ 
laborative  efforts,  privacy  protection,  along  w7it,h  reputation  management  is  in 
demand.  Recently  several  works  have  emerged  that  focus  on  these  problems. 
However,  works  on  providing  confidentiality  and  reputation  are  limited.  For  ex¬ 
ample,  most  of  the  works  focus  on  a  particular  problem  and  problem  domain, 
say  reputation  management  for  e-commerce  applications.  Considerably  less 
work  has  been  done  to  accommodate  novel  requirements,  such  as  collaborative 
access  requirements  and  developing  protocols  that  allow7  anonymity  w7hile  sup¬ 
port  accountability.  Furthermore,  most  of  the  existing  works  were  developed 
with  special  application  in  mind  and  are  not  generally  applicable. 

3.1.  Consequences  of  Novel  Findings  about  the  Internet.  We  assume  - 
without  limiting  generality  -  that  the  goal  of  the  Internet  is  to  publish.  Pub¬ 
lication  is  seen  here  as  a  general  w^ay  of  accomplishing  a  task  and  making  it 
available  for  others  in  a  understandable  (readable)  form.  In  turn,  the  goal 
of  collaboration  on  the  Internet  can  be  seen  as  editing  to  reach  the  goal  in 
an  efficient  manner.  In  our  approach  we  take  into  account  the  general  fea¬ 
tures  discovered  over  the  last  few  years  about  publishing  and  collaboration. 
These  features  can  be  best  described  in  terms  of  networks.  Take  an  author 
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(author  A)  and  his/her  co-authors.  Say,  author  B  is  a  co-author  of  author 
A.  This  is  a  minimal  network.  The  ’distance’  between  the  two  co-authors 
is  1.  Alternatively,  there  is  a  link  between  authors  A  and  B.  B  has  co¬ 
authors,  too.  Author  C  is  a  co-author  author  B,  whereas  author  C  is  not 
a  co-author  of  author  A.  There  are  links  between  authors  A  and  B  and  au¬ 
thors  B  and  C.  However,  there  is  no  link  between  author  A  and  C.  The  dis¬ 
tance  between  authors  A  and  C  is  equal  2.  A  broad  range  of  social  networks 
shows  general  features  when  formulated  this  way  [Watts  and  Strogatz,  1998, 
Barabasi  and  Albert,  1999,  i  Cancho  and  Sole,  2001].  The  general  features  are 
called  ’small- world  phenomenon’  and  ’scale-free  networks’.  For  example,  simi¬ 
lar  networks  have  been  discovered 

(1)  amongst  Hollywood  actors  (collaboration  is  playing  in  the  same  movie), 

(2)  authorship  in  scientific  communications, 

(3)  social  networks  (’collaboration’  means  knowing  each  other),  and 

(4)  in  the  so  called  Erdos-point  in  mathematics. 

One  should  design  security  architecture  for  originally  non-hierarchical  networks. 
Further,  one  may  allow  the  development  of  a  hierarchy  based  on  the  efficiency 
of  the  collaborating  partners.  An  example  is  the  Erdos  point,  where  there  is  a 
clear  hierarchical  structure  with  Paul  Erdos,  the  famous  mathematician  on  the 
top  of  that  hierarchy.  Our  model  is  called  ’Editorial  Board’  and  may  have  the 
following  ranks: 

•  reader 

•  author 

•  reviewer 

•  board  member 

•  editor 

Other  structures  are  possible,  including  specialization  in  the  form  of  action 
editors,  to  mention  a  simple  extension. 

Hierarchy  can  be  imposed  in  a  rigid  fashion.  One  corresponding  structure  is 

•  customer /user 

•  worker 

•  quality  assurance 

•  member  of  the  advisory  board 

•  CEO 

Note,  that  the  role  of  the  reader  or  that  of  the  user  is  special:  they  provide 
the  reinforcing  feedback  of  the  external  world  by  purchasing  the  information 
or  the  product. 

At  this  point,  we  note  the  following.  Starting  from  efficiency  based  self¬ 
organizing  communities  will  favor  competition  as  a  derivative  within  and 
amongst  communities.  Competition  will  require  security  rules.  Security  rules 
can  be  imposed  at  two  different,  levels:  (i)  at  the  level  of  the  self-organizing  com¬ 
munity,  and  (ii)  at  the  level  of  interaction  between  such  communities.  These 
concepts  are  novel,  so  we  give  an  example.  If  a  particular  self-organizing  com¬ 
munity  is  made  of  readers  and  authors  only,  then  the  collaboration  with  another 
community  with  more  strict  quality  assurance,  e.g.,  board  members,  action  ed¬ 
itors,  editor  will  be  limited.  Limitations  are  also  restricted  by  the  definition  of 
readers.  If  the  community  is  closed  then  special  rules  for  interaction  with  other 
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communities  need  to  lie  established.  If  anybody  can  be  a  reader,  then  self¬ 
organizing  collaborative  efforts  between  communities  may  emerge.  In  turn, 
some  communities  will  be  efficient  whereas  others  will  be  less  efficient.  The 
efficiency  is  governed  by  -  at  least  -  two  factors: 

(1)  The  efficiency  gained  by  collaboration 

(2)  The  slowing  down  of  project  organization  as  determined  by  the  time 
constraints  of  security  rules 

Note  that  we  aim  to  develop  concepts  for  collaborative  schemes  amongst  hu¬ 
mans  and  Internet  robots.  The  time  constraints  could  be  most  restrictive  for 
large  organizations. 

A  good  way  to  think  of  such  communities  is  to  consider 

(1)  the  participants  of  the  community  as  agents, 

(2)  the  community  as  an  assembly  of  agents, 

(3)  the  security  rules  of  the  community  as  the  language  of  the  community, 

(4)  the  interaction  amongst  communities  as  communication  between  agents 
speaking  possibly  different  ‘languages’. 

Alike  to  real  languages,  there  will  be  a  competition  amongst  these  languages. 
This  competition  will  favor  languages,  which  are  simple  for  level  communication 
and  which  are  abundant.  The  formulation  of  the  S20S  as  agents  speaking 
languages  is  not  the  subject  of  this  proposal.  At  this  point,  one  may  think  to 
establishing  security  rules, 

(1)  which  meet  the  common  general  structure  of  self-organizing  human 
collaboration, 

(2)  which  allow  the  derivation  of  ’pre-wired’  hierarchies, 

(3)  which  allow  collaboration  amongst  communities, 

(4)  which  can  be  secure  if  required  by  the  founders  of  the  self-organizing 
community. 

The  proposed  model  consists  of  independent,  autonomous  sub-systems,  gov¬ 
erned  by  a  set  of  local  goals.  Dynamics  of  these  independent  subsystems  defines 
a  global,  self-organizing  model  without  the  necessity  of  central  authorization. 
The  aim  is  to  develop  a  system  architecture  that  supports  collaboration  while 
guarantees  information  confidentiality,  integrity,  and  availability.  In  addition, 
anonymity  of  the  initiator  and  responder  are  guaranteed  as  w7ell  as  account¬ 
ability  that  is  used  to  deter  misuse  or  for  billing  services. 

Our  long-term  goals  are  to  provide  privacy  (hide  identity  of  initiator  and 
responder),  provide  mechanism  for  secure  online  collaboration  (protocols  what 
can  be  shared  between  the  parties,  policy  that  defines  who  is  allowed  to  ac¬ 
cess  what  and  what  level),  and  provide  accountability  (misuse  detection,  or 
billing  purposes.  Our  assumption  is,  that  users  may  form  ad-hoc  communities 
based  on  their  interest  and  expertise.  The  same  user  may  belong  to  different 
communities  at  different  roles  or  may  assume  different  roles  within  a  given 
community. 

In  the  editorial  board  example,  all  users  are  allowed  to  submit  papers  to 
be  reviewed,  and  all  users  are  allowed  to  review  papers.  Both  submitters  and 
reviewers  are  classified  according  to  the  quality  of  w7ork  they  perform.  Initially 
all  users  and  all  papers  are  assigned  an  ‘initial’  impact  factor.  Based  on  the 
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submitted  reviews,  the  impact  factor  of  each  publication  may  increase  or  de¬ 
creases.  In  addition,  reviewers’  impact  factor  will  be  changed  based  on  the 
response  to  their  comments,  i.e.,  level  of  update  of  the  reviewed  paper  based 
on  the  evaluation. 

Additional  restrictions  are  involved  on  the  model  to  guarantee  security  and 
anonymity  while  preventing  frauds  by  providing  accountability.  For  example, 
a  user  may  not  be  able  to  change  his/her  impact  factor  by  login  in  as  new  user, 
submit  the  same  paper  to  get  a  better  evaluation  of  the  paper,  or  submit  high 
quality  reviews  of  his/her  own  work.  Although,  some  of  the  above  concepts 
have  been  independently  considered  by  researcher,  there  does  not  exist  a  com¬ 
plete  model  that  supports  both  security  and  anonymity  in  a  single,  dynamic 
environment,  while  preserving  accountability. 

3.2.  Self-Organizing  System  (SOS)  for  Online  Collaboration.  SOS  may 

be  distributed  and  may  have  human  as  well  as  robotic  participants.  In  order  to 
demonstrate  the  generality  of  our  approach  a  strictly  robotic  example  is  pro¬ 
vided  here.  Access  control  issues,  and  some  experiments  about  their  efficiency 
are  provided  to  support  our  argumentation. 

An  Internet  robot  may  represent  a  hierarchy.  The  robot  is  on  a  host 
computer  and  will  be  called  ’hostess’.  The  hostess  can  launch  topic  specific 
searches  using  crawlers,  which  download  Internet  documents  one-by-one,  using 
the  links  of  the  documents.  Intelligent  crawlers  adapt  according  to  the  rein¬ 
forcing  signal  received  from  the  external  world  and  transferred  by  the  hostess. 
Communication  is  possible  amongst 


FIGURE  6.  Hostess-crawler  system 

Internet  is  explored  by  crawlers  (which  download  and  exam¬ 
ine  documents),  whereas  communication  with  other  entities  is 
governed  by  the  hostess,  which  can  launch  and  modify  crawlers 
based  on  reinforcements. 

•  crawlers  and 

•  hostesses 

Communication  can  be  direct  or  indirect.  For  example,  if  the  hostess  launches 
crawlers  on  the  same  topic  and  provides  positive  reinforcement  only  to  the 
one,  which  brings  the  information  first,  indirect  communication  happens.  This 
indirect  communication  will  manifest  itself  as  a  competition.  The  competition 
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will  be  directly  addressed  if  crawler  maintain  and  refresh  a  list  of  good  links  - 
we  call  those  weblogs  -  which  used  to  provide  the  most  positive  reinforcement 
from  the  crawler.  One  may  see  these  weblogs  as  local  sub-crawlers,  which  have 
starting  points  and  search  a  smaller  neighborhood.  The  algorithm  of  weblog 
selection  is  provided  in  Fig.  7 


1)  s  <—  startNode 

2)  FOR  i  =  1  To  memory 

a)  s  <—  chooseRL(frontier) 

memory 

b)  wl<-s,  w(s)<-  Y,rk 

k=i 

3)  sort(wl,w(s)),  head(wl,wlsize) 

4)  REPEAT  (for  every  local  search ) 

a)  w'(s)=  w(s)*y ,  V  s  e  wl 

b)  s  <—  choose(wl) 

c)  FOR  i  =  lTo  memory 

i)  s  <—  chooseRL( frontier) 

memory 

ii)  wl  <—  s,  w'(.v)=  w(v)+  y Vj. 

k=i 

iii)  sort(wl,w(s)),  head(wl,wlsize) 

5)  UNTIL  end  of  experiment 


Figure  7.  Algorithm  for  forming  a  weblog 

The  algorithm  has  three  important  parameters:  (i)  ‘memory’: 
size  of  environment  around  a  node  of  the  network,  examined 
during  the  evaluation  of  that  node,  (ii)  ‘ wlsize size  of  the 
weblog  (number  of  links  that  can  be  remembered),  and  (iii) 
‘7’:  discount  factor  of  past  rewards  received  around  a  node. 


The  origin  of  competition  can  be  seen  as  follows.  The  small- world  property 
can  trap  crawlers  in  a  neighborhood  where  links  are  abundant  and  point  to 
each  other.  Crawlers  might  not  find  any  novel  information.  To  overcome  this 
limitation,  each  crawler  may  maintain  a  list  of  links,  -which  are  worth  to  visit. 
Assume  two  crawlers:  one  with  a  long  list  and  another  one  wTit,h  a  single  link. 
The  second  crawler  will  find  the  novel  information  on  that  single  link  with  high 
probability.  However,  that  information  will  not  be  novel  for  the  first  crawler 
anymore,  therefore,  the  first  crawler  will  not  be  able  to  "publish"  it  at  the 
hostess.  As  the  consequence,  the  first  crawler  will  lower  the  value  of  this  link 
and  will  visit  this  site  less-and-less  frequently. 

The  downloaded  part  of  the  Internet  and  its  division  between  the  two 
crawlers  can  be  seen  in  Fug.  9 

The  competition  gave  rise  to  a  decrease  the  age  of  found  relevant  document 
in  our  model  experiments  as  it  is  shown  in  Figs.  10  and  11 
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Connectivy  Matrix  of  Competing  Crawlers 


FIGURE  8.  Two  competing  crawlers  using  weblogs  in  a 
"  small- world  " 

The  figure  depicts  the  connectivity  table  (representing  the 
links)  between  nodes.  Typical  scale-free  small  world  connec¬ 
tivity  can  be  seen  here:  some  nodes  have  many  links  pointing 
onto  them  (rows),  whereas  others  have  a  long  list  of  links  point¬ 
ing  to  other  nodes  (columns).  Green  (red)  dots  depict  the  links 
used  by  the  first  (second)  crawler.  Efficient  self-organizing  di¬ 
vision  of  the  world,  i.e.,  the  task  space,  can  be  seen.  Weblogs 
contained  10  links. 


4.  Discussion 

The  competitive-collaborative  method  has  the  following  advantages.  Pro¬ 
cessing  is  local:  each  crawler  modifies  values  of  known  links  based  on  the  rein¬ 
forcement.  There  is  a  global  improvement  in  performance. 

Let  us  consider  now  interacting  hostesses.  Some  of  the  hostesses  (authors) 
can  be  selected  to  being  a  ‘reviewer’,  a  member  of  the  board,  or  the  editor. 
The  selection  could  be  based  on  their  efficiency,  or  that  the  search  is  divided 
into  topics  and  central  hostesses  have  access  to  the  context  of  the  search.  An¬ 
other  intriguing  possibility  is  that  in  a  general  search  problem,  search  topics 
are  may  also  be  subject  to  competition.  Under  this  condition,  small-worlds 
become  larger  (!),  the  ‘diameter’  belonging  to  topics  is  larger,  however,  search¬ 
ing  may  become  more  efficient.  Central  hostesses  may  emerge  as  a  result  of 
this  competition.  The  emergence  of  central  hostesses  can  be  promoted  by  the 
type  of  interactions,  the  access  control,  allowed  by  the  robotic  community  (see 
below).  We  conclude  this  paragraph  by  noting  that  the  hostess-crawler  system 
is  one  type  of  SOS,  whereas  collaborating  hostesses  can  form  another  type  of 
SOS. 

Interaction  among  the  system  components  fall  into  two  categories: 

(1)  Interactions  within  a  single  SOS 

(2)  Interactions  among  different  SOSs 
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Relevance  of  these 


which  increase 
the  value  of  crawler  A 

FIGURE  9.  Division  of  a  substructre  of  the  Internet  be¬ 
tween  two-crawlers 

The  gray  region  belongs  to  crawler  A.  Both  crawlers  maintain 
a  list  of  weblogs,  which  are  efficient  restart  links  to  continue 
search  when  novel  information  in  the  actual  neighborhood  has 
been  collected.  Weblogs  are  ordered  by  their  values  and  change 
dynamically  amongst  crawlers  during  competition. 


Each  SOS  is  motivated  to  become  stronger.  The  overall  gain  of  a  SOS,  in  this 
competitive  environment,  depends  on  the  success  of  other  S20Ss.  Each  mem¬ 
ber  of  a  single  SOS  supports  each  other  by  sharing  relevant  information.  On  the 
other  hand,  no  such  information  sharing  is  done  between  different  SOSs.  Tem¬ 
poral  sequences  and  synchronizations  support  this  information  sharing  process. 
If  Crawler  A  discovers  event  p  on  site  X  then  it  might  be  useful  to  send  that 
information  to  Crawler  B  "new  information  on  site  X".  That  information  can 
serve  as  the  "context"  for  crawler  B  and  can  help  to  accelerate  search. 

That  is,  one  should  be  looking  for  synchronous  or  spatio-temporal  patterns. 
The  ideal  tool  -  as  it  seems  to  us  -  is  the  two  types  of  Hebbnet  learning  rules. 
In  turn,  one  may  expect  that  Hebbnets  with  symmetric  learning  rules  will  rep¬ 
resent  synchronous  patterns,  whereas  Hebbnets  with  asymmetric  learning  rules 
will  represent  spatio-temporal  patterns.  The  expected  prototype  form  of  such 
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Average  age  of  relevant  documents  found 


FIGURE  10.  Decrease  of  age  of  found  novel  documents 

New  documents  appear,  whereas  old  documents  disappear  in 
this  experiment.  Competition  settles  around  "time"  600.  The 
overlap  amongst  links  searched  by  crawlers  becomes  minimized 
by  this  time.  Both  crawlers  have  high  connectivity  weblogs. 
The  age  of  found  novel  documents  is  decreasing  afterwards. 


Divide  and  Work  Efficiently 


FIGURE  11.  Increase  of  access  to  novel  documents. 

New  documents  appear,  whereas  old  documents  disappear  in 
this  experiment.  Competition  settles  around  "time"  600.  The 
overlap  amongst  links  searched  by  crawlers  becomes  minimized 
by  this  time.  Both  crawlers  have  high  connectivity  weblogs. 
The  efficiency  of  found  novel  documents  is  increasing  after¬ 
wards. 
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spatio-temporal  patterns  is  an  avalanche-like  pattern,  alike  in  most  sustained 
systems. 

The  point,  to  remember  is  that  communication  amongst  crawler  or  hostesses 
can  be 

•  indirect  based  upon  reinforcement 

•  direct 

-  by  sending  documents  to  each  other, 

-  by  sending  elements  of  weblogs, 

-  by  sending  information  about  the  fact  that  a  novel  document  has 
been  found  ‘in  my  area’, 

•  or  combinations  of  any  of  these. 

The  communication  amongst  crawlers  or  hostesses  is  defined  by  the  access 
control  rules.  These  rules  are  of  central  importance.  These  rules  determine 
the  type  of  SOS  that  will  emerge.  To  each  type  of  communication,  the  access 
control  model  of  the  corresponding  SOS  must  ensure  that  a  component  will 
not  be  able  to  disclose  or  modify  resources  if  it  does  not  have  the  proper 
authorization.  These  access  rights  determine  the  type  of  co-operation  and  the 
distribution  of  topics  amongst  the  small  sub-systems. 


5.  Conclusions 

The  competitive-collaborative  model  is  efficient  according  to  our  prelimi¬ 
nary  studies.  It  allows  one  to  impose  competition  simply  via  reinforcement. 
Reinforcement  gives  rise  to  the  division  of  tasks  and  improved  efficiency  within 
subtasks. 

The  concept  can  be  extended  by  communication  methods  other  then  the 
indirect  communication  of  the  reinforcing  signal.  Other  types  of  message  send¬ 
ing  methods  open  questions  about  interactions  amongst  self-organizing  systems 
(SOS),  the  access  control  amongst  those  regarding  the  type  of  information  to 
be  communicated. 

The  idea  of  establishing  experience  based  connections  using  Hebbnets  seem 
most  promising  as  the  tool  of  collaboration,  provided  that  access  control  issues 
are  solved.  Learning  rules  of  a  Hebbnet  can  discover  and,  in  turn,  may  allow 
clustering  of  synchronous  patterns  as  well  as  spatio-temporal  patterns,  most 
probably  in  the  form  of  avalanches. 

Studies  along  these  lines  are  beyond  the  scope  of  the  present  con¬ 
tract.  Considerations  on  different  possibilities  are  currently  in  basic 
research  phase.  In  particular,  studies  about 

•  access  control  rules 

•  learning  and  recognition  of  spatio-temporal  patterns  in  early 
phases 

•  secure  communication  methods 

have  been  started  and  might  reach  a  more  mature  state  in  about 
three  months. 
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Appendix 
6.  Crawlers 

6.1.  Introduction.  The  number  of  documents  on  the  world- wide  web  is  way 
over  1  billion  [Diligent!  et  al.,  2000].  The  number  of  new  documents  is  over  1 
million  per  day.  The  number  of  documents  that  change  on  a  daily  basis,  e.g., 
documents  about  news,  business,  and  entertainment,  could  be  much  larger. 

This  ever  increasing  growth  presents  a  considerable  problem  for  finding,  gath¬ 
ering,  ordering  the  information  on  the  web.  The  only  search  engine  that  may 
still  warrant  that  the  information  it  provides  is  not  older  than  1  month  is 
AltaVista1.  However,  the  number  of  indexed  pages  on  Altavista  is  about  250 
million  documents.  Google2,  on  the  other  hand,  is  indexing  about  1,300  million 
pages,  but  Google  does  not  warrant  any  refresh  rate  of  these  documents. 

The  problem  is  complex:  These  search  engines  are  not  up-to-date  and  in¬ 
formation  gathering  is  not  always  efficient  with  these  engines.  Search  engines 
may  offer  too  many  documents;  sometimes  on  the  order  of  hundreds  or  many 
thousands.  Many  web  pages  have  no  value,  e.g.,  by  making  use  of  a  large  set 
of  keywords,  or  being  simply  huge  collections  of  documents  originating  from 
broad  sources. 

Specialized,  possibly  personalized  crawlers  are  in  need.  This  problem  repre¬ 
sents  a  real  challenge  for  methods  of  artificial  intelligence  and  has  been  tack¬ 
led  by  several  research  groups  [Cho  et  ah,  1998,  Dean  and  Henzinger,  1999, 
Chakrabart.i  et  al.,  1999a,  McCallum  et  al.,  1999,  Kolluri  et  al.,  2000,  Lawrence,  2000, 
McCallum  et  al.,  2000,  Mukherjea,  2000,  Murdock  and  Goel,  1999].  One  of  the 
first  attempts  in  this  direction  was  made  by  Chakrabarti  et  al.  [Chakrabart.i  et  al.,  1999b] 
who  put  forth  the  idea  of  focused  crawling.  To  understand  the  idea,  let  us  con¬ 
sider  crawling  in  general.  Assume  that  ‘you  are  at  a  node’  of  the  web.  This 
node  has  been  analyzed  and  you  have  to  decide  what  to  do  next.  It  is  very 
possible  that  relevant  information  can  be  find  in  the  immediate  neighborhood 
of  this  node.  In  turn,  you  download  all  the  documents  next  to  you  and  start 
to  analyze  those  documents.  Doing  so,  you  may  found  relevant  documents  or 
may  not.  When  you  are  done  you  have  the  option  to  download  all  the  doc¬ 
uments  that  are  two  steps  away  from  you  and  to  analyze  those  documents. 

This  approach  is  well  known  in  the  Al  literature  and  is  called  breadth  first 
technique.  However,  the  world- wide  web  is  ‘small’:  The  WWW  had  about  800 
million  nodes  in  1999  and  the  number  of  minimal  hops  required  to  reach  most 
documents  from  any  particular  document  was  19  [Albert  et  al.,  1999].  Such 
connectivity  structure  between  units  is  called  ‘small  world’.  In  turn,  breadth 
first  search  incurs  an  enormous  burden  as  a  function  of  depth.  At  one  point 
(at  a  given  depth)  breadth  first  search  needs  to  be  abandoned  and  a  decision  is 
to  be  made  to  which  node  to  move  next.  To  decide  on  that  move,  the  values  of 
the  nodes  need  to  be  estimated  from  the  point  of  view  of  the  goal  of  the  search. 

Focused  crawling  is  based  on  this  idea.  Focused  crawling  makes  an  attempt 
to  classify  the  content  of  the  document.  If  the  document  falls  into  the  search 
category  then  the  document  is  downloaded  and  the  links  of  the  documents  are 
followed. 


1 http : /  /  www .  alt  a.vist  a .  com 

2http: //www. google. com 
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Diligenti  et  al.  [Diligenti  et  al.,  2000]  have  recognized  the  pitfall  of  focused 
crawling:  searched  information  on  the  web  is  typically  hidden :  Sites  of  partic¬ 
ular  interest  may  have  a  lower  number  of  directed  links  then  sites  of  general 
interest.  In  turn,  we  might  face  the  ‘needle  in  the  haystack’  problem  with 
the  haystack  being  sites  on  general  interest.  The  hidden  property  is  thus  the 
implicit  consequence  of  our  particular  interest. 

Let  us  consider  sites  dealing  with  support  vector  machines  (SVMs).  Sites 
about  SVMs  are  not  typical  on  the  web.  Not  all  sites  dealing  with  SVM  are 
linked.  In  turn,  focused  crawling  could  be  rather  inefficient  and  this  direct 
search  for  SVM  sites  might  fail.  On  the  other  hand,  most  of  the  SVM  sites 
are  within  (i.e.,  linked  to)  academic  environments,  or  within  sites  dealing  with 
information  technology.  These  topics  are  much  more  general  and  might  have 
much  more  links  and  a  much  higher  ‘visibility’.  In  turn,  searching  for  the  envi¬ 
ronment  of  SVM  sites,  could  be  much  more  efficient.  A  hand-waving  argument 
can  be  given  as  follows.  Documents  are  linked  to  each  other.  Links  are  made 
by  those  for  whom  the  document  has  value.  These  links  form  the  one-step 
context  of  the  document.  The  one-step  context,  in  turn,  may  be  characteristic 
to  the  document.  The  one-step  environment  of  the  document  (i.e.,  documents 
that  are  one  step  away),  documents  that  are  two  steps  away,  etc.,  form  the 
‘context’  of  the  document.  When  we  search  for  a  document,  by  definition,  we 
shall  encounter  the  environment  of  the  document  first.  In  turn,  first  we  might 
search  for  the  environment  of  the  document.  This  is  the  idea  behind  ‘context, 
focused  crawling’  (CFG)  [Diligenti  et  al.,  2000].  This  idea,  which  is  trivial  for 
graphs  with  high  clustering  probability  (e.g.,  regular  lattices),  could  be  criti¬ 
cized  for  the  case  of  ‘small  worlds’,  when  documents  -  on  average  -  are  about 
as  far  as  the  environment  of  the  document.  However,  the  question  is  intriguing, 
because  the  visibility  could  be  much  less  for  searched  documents  than  that  of 
the  environment  of  the  searched  documents. 

CFG  does  not  take  into  consideration  the  varieties  on  the  web:  Environ¬ 
ments  may  differ.  For  example,  for  small  universities  or  for  small  research 
institutes  ‘one-step  context’  may  correspond  to  ‘two-step  context’  for  large  de¬ 
partments  of  large  universities.  If  the  order  of  contexts  might  change  then 
CFG  will  go  close  and  will  miss  the  documents.  In  turn,  the  decision  whether 
to  ‘stay  and  download’  at  a  given  site  or  ‘not  to  download  but  move’  can  be 
seriously  jeopardized.  Fast  adapting  value  estimation  method  may  provide  an 
attractive  solution  to  this  search  problem  wffiere  information  is  hidden  within 
not-yet-experienced  environments.  The  environment  of  high  value  documents 
can  provide  reinforcing  feedback  in  a  straightforward  fashion.  Interestingly,  re¬ 
inforcement  learning  (RL)  has  not  been  found  particularly  efficient  for  searching 
the  world-wide  web  [Rennie  et  al.,  1999].  The  efficiency  of  RL,  however,  de¬ 
pends  strongly  on  feature  extraction.  It  seems  natural  to  explore  the  CFG  idea 
as  the  initial  feature  extraction  method  for  RL.  Here  we  show  combine  CFG 
with  RL  to  search  on  the  web. 

6.2.  Methods. 

6.2.1.  Preprocessing  of  texts.  There  is  a  large  variety  of  methods  that  try  to 
classify  texts  [McCallum,  1996,  Blum  and  Mitchell,  1998,  Dumais  et  al.,  1998, 

Kaski,  1998,  Chakrabarti  et  al.,  1998,  Kolenda  et  al.,  2000,  Mitchell,  1999,  McCallum,  1999, 
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Hofmann,  2000,  Nigam  et.  al.,  2000,  Kaban  and  Girolami,  2000,  Vinokourov  and  Girolami,  2000a, 
Joachims,  2000,  Dominich,  2000].  Most  of  these  methods  are  based  on  special 
dimension  reduction.  First,  the  occurrence,  or  sometimes  the  frequency  of  se¬ 
lected  words  is  measured.  The  subset  of  all  possible  words  (‘bag  of  words’ 

(BoW))  is  selected  by  means  of  probabilistic  measures.  Different,  methods  are 
used  for  the  selection  of  the  ‘most  important’  subset.  The  occurrences  (0’s 
and  l’s)  or  the  frequencies  of  the  selected  words  of  the  subsets  are  used  to 
characterize  all  documents.  This  low  -  typically  100  -  dimensional  vector  is 
supposed  to  encompass  important  information  about  the  type  of  the  docu¬ 
ment..  Different,  methods  are  used  to  derive  ‘closeness  measures’  between  docu¬ 
ments  in  the  low  dimensional  spaces  of  occurrence  vectors  or  frequency  vectors. 

The  method  can  be  used  both  for  classification,  i.e.,  the  computation  of  deci¬ 
sion  surfaces  between  documents  of  different  ‘labels’  [Blum  and  Mitchell,  1998, 

Dumais  et.  ah,  1998,  Joachims,  2000,  Nigam  et  ah,  2000]  and  clustering,  a  more 
careful  way  of  deriving  closeness  (or  similarity)  measures  when  no  labels  are  pro¬ 
vided  [McCallum,  1996,  Kaski,  1998,  Hofmann,  2000,  Kaban  and  Girolami,  2000, 

Vinokourov  and  Girolami,  2000b]. 

We  tried  several  BoW  based  classifiers  on  the  ‘Call  for  Papers’  (CfP)  prob¬ 
lem3.  CfP  is  considered  a  benchmark  classification  problem  of  documents: 

The  ratio  of  correctly  classified  and  misclassified  documents  can  be  automated 
easily  by  checking  whether  the  document  has  the  three  word  phrase  ‘Call  for 
Papers’,  or  not.  Classifiers  were  developed  for  one-step,  two-step  environments, 
etc.,  for  CfP  documents.  We  found  that,  these  classifiers  perform  poorly  for  the 
CfP  problem.  In  agreement,  with  published  results  [Dumais  et.  ah,  1998],  su¬ 
pervised  SVM  classification  was  superior  to  other  methods.  SVM  was  simple 
and  somewhat,  better  than  Bayes  classification.  However,  SVM  requires  a  large 
number  of  support,  vectors  for  the  CfP  problem. 


6.2.2.  SVM  classification.  The  SVM  classifier  operates  similarly  to  percep- 
trons.  SVM,  however,  has  better  generalizing  capabilities,  see,  e.g.,  the  compre¬ 
hensive  book  of  Vapnik  [Vapnik,  1995]  a  tutorial  material  [Smola  and  Scholkopf,  1998], 
comparisons  with  other  methods  [Guyon  et.  ah,  1992,  Cristianini  and  Shawe- Taylor,  1999], 
improved  techniques  [Keerthi  et  ah,  1999]  and  references  therein4.  The  trained 
SVM  was  used  in  ‘soft.  mode’.  That,  is,  the  output,  of  the  SVM  was  not.  a  deci¬ 
sion  (yes,  or  no),  but.  instead,  the  output,  could  take  continuous  values  between 
0  and  1.  A  saturating  sigmoid  function5  was  used  for  this  purpose.  In  turn, 

(i)  the  non-linearity  of  the  decision  surface  was  not  sharp,  (ii)  for  inputs  close 
to  the  decision  surface  the  classifier  provides  a  linear  output.  The  output,  of 
the  sigmoid  non-linearity  can  be  viewed  as  the  probability  of  a  class.  These 
probabilities  for  the  different  classes  are  distinct  yardsticks  working  on  possibly 
different  features.  The  RL  algorithm  was  used  to  estimate  the  value  of  these 
yardsticks. 


3The  CfP  problem  is  defined  by  deleting  the  phrase  ‘call  for  paper’  from  the  document, 
executing  search  on  the  internet  and  considering  each  document  that  contains  the  phrase 
‘call  for  paper’  a  ‘hit’. 

4Note  that  SVM  has  no  adjustable  parameters. 

5output  =  1+exp(_\tinput) 
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6.2.3.  Value  estimation.  There  is  a  history  of  value  estimation  methods  based 
on  reinforcement  learning:  Some  of  the  important  steps  —  judged  subjec¬ 
tively  —  are  in  the  cited  papers:  [Korf,  1985,  Minton,  1988,  Sutton,  1988, 

Watkins,  1989,  Schmidhuber,  1991,  Mahadevan  and  Connell,  1992,  Dayan  and  Hinton,  1993, 
Kaelbling,  1993,  Rummery  and  Niranjan,  1996,  Littman  et,  ah,  1995,  Mataric,  1997, 
Dietterich,  2000].  A  thorough  review  on  the  literature  and  the  history  of  RL 
can  be  found  in  [Sutton  and  Barto,  1998].  In  our  approach,  value  estimation 
plays  a  central  role.  Value  estimation  works  on  states  (s)  and  provides  a  real 
number,  the  value ,  that  belongs  to  that,  state:  V(s)  €  R.  Value  estimation  is 
based  on  the  immediate  rewards  (e.g.,  the  number  of  hits)  that  could  be  gained 
at  the  given  state  by  executing  different  actions  (e.g.,  download  or  move).  Value 
of  a  state  (a  node,  for  example)  is  the  long-term  cumulative  reward  that  can 
be  collected  starting  from  that  state  and  using  a  policy.  Policy  is  a  probability 
distribution  over  different  actions  for  each  state:  policy  determines  the  prob¬ 
ability  of  choosing  and  action  in  a  given  state.  Policy  improvement  and  the 
finding  of  an  optimal  policy  are  central  issues  in  RL.  RL  procedures  can  be 
simplified  if  all  possible  future  states  are  available  and  can  be  evaluated.  This 
is  our  case.  In  this  case  one  does  not  have  to  represent  the  policy.  Instead,  one 
could  evaluate  all  neighboring  nodes  of  the  actual  state  and  move  to  (and/or 
download)  the  one  with  the  largest  estimated  long  term  cumulated  reward,  the 
estimated  value.  Typically  one  includes  random  choices  for  a  few  percentages 
of  the  steps.  These  random  choices  are  called  ‘explorations'.  The  estimated 
value  based  greedy  choice  is  called  ‘exploitation’. 

If  the  downloaded  document  contains  the  phrase  ‘call  for  papers’  then  the 
learning  system  incurs  an  immediate  reward  of  1.  If  a  downloaded  document 
does  not  contain  this  phrase  then  there  is  negative  reward  (i.e.,  a  punishment)  of 
-0.01.  These  numbers  were  rather  arbitrary.  The  relative  ratio  between  reward 
and  punishment  and  the  magnitude  of  the  parameter  of  the  sigmoid  function 
do  matter.  These  parameters  influence  learning  capabilities.  Our  studies  were 
constrained  to  a  fixed  set  of  parameters.  One  may  expect  improvements  upon 
optimizing  these  parameters  for  a  particular  problem.  In  our  case,  search  over 
the  internet  was  time  consuming  and  prohibited  this  optimization. 

Value  estimation  makes  use  of  the  following  upgrade 


(5)  V+{st)  =  V(st)  +  a*  (rt+i  +  7  *  R(st+i)  -  V(s*)) 


where  a  is  the  learning  rate,  rt+ 1  €  R  is  the  immediate  reward,  0  <  7  <  1 
is  the  discount  factor,  and  subscripts  t  =  1,2,...  indicate  action  number  (i.e., 
time).  This  particular  upgrade  is  called  temporal  differencing  with  zero  el¬ 
igibility,  i.e.,  the  TD(0)  upgrade.  TD  methods  were  introduced  by  Sutton 
[Sutton,  1988].  An  excellent  introduction  to  value  estimation,  including  the 
history  of  TD  methods  and  description  on  the  applications  of  parameterized 
function  approximators  can  be  found  in  [Sutton  and  Barto,  1998].  Concerning 
details  of  the  RL  technique,  (a)  we  used  eligibility  traces,  (b)  Opposed  to  the 
description  given  above,  we  did  not  need  explorative  steps  because  the  envi¬ 
ronments  can  be  very  different  and  that,  diminished  the  need  for  exploration, 
(c)  We  did  not  decrease  the  value  of  a  by  time  to  keep  adaptivity,  (d)  We 
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Figure  12.  Context  of  the  document 

Document  and  its  first  and  second  ‘neighbors’. 

approximated  the  value  function  as  follows 

n 

(6)  V(s)  «  £  wt  <r(SVM(i)) 

j=i 

where  the  output  of  the  ith  SVM  (i.e.,  the  ith  component  of  the  output)  is 
denoted  by  SVM(i),  er(.)  denotes  the  sigmoid  function  acting  on  the  outputs  of 
the  SVM  classifiers,  Wj  is  the  weight  (or  relevance)  of  the  ith  classifier  deter¬ 
mined  by  upgrade  Eq.  5.  If  the  quality  of  the  upgrade  is  measured  by  the  mean 
square  error  of  the  estimations  then  the  following  approximate  weight  upgrade 
can  be  derived  for  the  weights  (see,  e.g.,  [Sutton  and  Barto,  1998]  for  details): 

(7)  A w.i  =  a  *  (rt+i  +  7  *  V(st+i)  -  V(st))  *  er(SVM(i)). 

This  upgrade  —  extended  with  eligibility  traces  [Sutton,  1988,  Sutton  and  Barto,  1998] 
—  was  used  in  our  RL  engine. 

6.3.  Features  and  learning  to  search. 

6.4.  Breadth  first  crawler.  A  crawler  is  called  breadth  first  crawler ,  if  it  first 
downloads  the  document  of  the  launching  site,  continues  by  downloading  the 
documents  of  all  first  neighbors  of  the  launching  site,  then  the  documents  of  the 
neighboring  sites  of  the  first  neighbor  sites,  i.e.,  the  documents  of  the  second 
neighbor  sites,  and  so  on. 

6.4.1.  Context  focused  crawler.  A  target  document  and  its  environment  are 
illustrated  in  Fig.  (12).  .  The  goal  is  to  locate  the  document  by  recogniz¬ 
ing  its  environment  first  and  then  the  document  within.  The  CFG  method 
[Diligenti  et  ah,  2000]  was  modified  slightly  —  in  order  to  allow  direct  compar¬ 
isons  between  the  CFG  method  and  the  CFG  method  extended  by  RL  value 
estimation  —  and  the  following  procedure  was  applied.  First,  a  set  of  ir¬ 
relevant  documents  were  collected.  The  kth  classifier  was  trained  on  (good) 
documents  fc-steps  away  from  known  target  documents  and  on  (bad)  irrelevant 
documents.  The  classifier  was  trained  to  output  a  positive  number  (‘yes’)  for 
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Figure  13.  SVM  based  document  classifiers  (A)  Classi¬ 
fication  of  distance  from  document  using  SVM  classifiers.  The 
CFG  method  maintains  a  list  of  visited  links  ordered  according 
to  the  SVM  classification.  One  of  the  links  belonging  to  the 
best  non-empty  classifier  is  visited  next.  (B)  Value  estimation 
based  on  SVM  classifiers.  Reinforcement  learning  is  used  to 
estimate  the  importance  of  the  different  classifiers  during 
search. 

good  documents  and  to  output  a  negative  number  (‘no’)  for  irrelevant  docu¬ 
ments.  The  outputs  were  scaled  into  the  interval  (0,1)  by  using  the  sigmoid 
function  <r(x)  =  (1  +  exp(—Xx))~1.  If  the  kth  classifiers  output  was  close  to  1 
-  according  to  its  decision  surface  -  there  is  a  target  document  /c-steps  away 
from  the  actual  site/document.  If  more  than  one  classifier  outputs  ‘yes’  then 
only  the  best  classifier  is  considered  in  CFG.  Other  outputs  are  neglected.  The 
CFG  idea  with  SVM  classifiers  is  shown  in  Fig.  (13) (a).  CFG  maintains  a  list 
of  visited  links  ordered  according  to  the  SVM  classification.  One  of  the  links 
belonging  to  the  best  non-empty  classifier  is  visited  next  (this  procedure  is 
called  backtracking). 

The  problem  of  the  CFG  method  can  be  seen  by  considering  that  neighbor¬ 
hoods  on  the  WWW  may  differ  considerably.  Even  if  the  kth  classifier  is  the 
best  possible  such  classifier  for  the  whole  web,  it  might  provide  poor  results  in 
some  (possibly  many)  neighborhoods.  For  example,  if  there  is  a  large  number  of 
connected  documents  all  having  the  promise  that  there  is  a  valuable  document 
in  their  neighborhood  -  but  there  is,  in  fact,  none  -  then  the  CFG  crawler  will 
download  all  invaluable  documents  before  moving  further.  It  is  more  efficient 
to  learn  which  classifiers  predict  well  and  to  move  away  from  regions  which 
have  great  but  unfulfilled  promises. 

It  has  been  suggested  that  classifiers  could  be  retrained  to  keep  adaptivity 
[Diligenti  et  ah,  2000].  The  retraining  procedure,  however,  takes  too  long6  and 
can  be  ambiguous  if  CFG  is  combined  with  backtracking.  Moreover,  retrain¬ 
ing  may  require  continuous  supervisory  monitoring  and  supervisory  decisions. 
Instead  of  retraining,  we  suggest  to  determine  the  relevance  of  the  classifiers 
during  the  search. 

®Training  may  take  on  the  order  of  a  day  or  so  on  700  MHz  Pentium  III  according  to  our 
experiences. 
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Figure  14.  Search  pattern  for  breadth  first  crawler. 

Search  was  launched  from  neutral  site.  A  site  is  called  neutral 
if  there  is  very  few  target  document  in  its  environment.  Di¬ 
ameter  of  open  circles  is  proportional  to  the  number  of  target 
documents  downloaded.  Edges  are  color  coded.  There  are  two 
extremes.  Dark  blue:  Site  was  visited  at  the  early  stage  dur¬ 
ing  the  search.  Light  blue:  Recently  visited  site.  (For  further 
details,  see  text.) 


6.4.2.  CFG  and  RL:  Fast  adaptation  during  search.  Reinforcement  learning 
offers  a  solution  here.  If  the  prewired  order  of  the  classifiers  is  questionable  then 
we  could  learn  the  correct  ordering.  There  is  nothing  to  loose  here,  provided 
that  learning  is  fast.  If  prewiring  is  perfect  then  the  fast  learning  procedure 
will  not  modify  it.  If  the  prewiring  is  imperfect  then  proper  weights  will  be 
derived  by  the  learning  algorithm. 

The  outputs  of  the  SVMs  can  be  saved.  These  outputs  can  be  used  to  esti¬ 
mate  the  value  of  a  document  at  any  instant.  Value  is  estimated  by  estimating 
weights  for  each  SVM  and  adding  up  the  SVM  outputs  multiplied  by  these 
weights.  In  turn,  one  can  compute  value  based  ordering  of  the  documents  with 
minor  computational  effort  and  this  reordering  can  be  made  at  each  step.  This 
reordering  of  the  documents  replaces  prewired  ordering  of  the  CFG  method. 
The  new  architecture  is  shown  in  Fig.  (13)(b). 

6.5.  Results  and  discussion.  The  CfP  problem  has  been  studied.  Search 
pattern  at  the  initial  phase  for  the  breadth  first  method  is  shown  in  Fig.  (14). 

Search  patterns  for  the  context  focused  crawler  and  the  crawler  using  RL 
based  value  estimation  are  shown  in  Fig.  (15)  and  Fig.  (16).  The  launching 
site  of  these  searches  was  a  ‘neutral  site’,  a  relatively  large  site  containing 
few  CfP  documents  (http://www.inf.elte.hu).  We  consider  this  type  of 
launching  important  for  web  crawling:  It  simulates  the  case  when  mail  lists 
are  not  available,  traditional  search  engines  are  not  satisfactory,  and  breadth 
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FIGURE  15.  Search  pattern  for  context  focused  crawler. 

Search  was  launched  from  neutral  site.  Diameter  of  open  cir¬ 
cles  is  proportional  to  the  number  of  target  documents  down¬ 
loaded.  Edges  are  color  coded.  There  are  two  extremes.  Dark 
blue:  Site  was  visited  at  the  early  stage  during  the  search. 
Light  blue:  Recently  visited  site. 


first  search  is  inefficient.  This  particular  site  was  chosen  because  breadth  first 
search  could  find  very  few  documents  starting  from  this  site. 

‘Scales’  on  Fig.  (15)  and  Fig.  (16)  differ  from  each  other  and  from  that 
of  Fig.  (14).  ‘True  surfed  scale’  would  be  reflected  by  normalizing  to  edge 
thickness.  Radius  of  open  circles  is  proportional  to  the  number  of  downloaded 
target  documents.  The  CFG  is  only  somewhat  better  in  the  initial  phase  than 
the  breadth  first  method.  Longer  search  shows  that  CFG  becomes  considerably 
better  than  the  breadth  first  method  when  search  is  launched  from  this  neutral 
site. 

Quantitative  comparisons  are  shown  in  Fig.  (17).  According  to  the  fig¬ 
ure,  upon  downloading  20,000  documents,  the  number  of  hits  were  about  50, 
200,  and  1000  for  the  breadth  first,  the  CFG  and  CFG  based  RL  crawlers,  re¬ 
spectively.  These  launches  were  conducted  at  about  the  same  time.  We  shall 
demonstrate  that  the  large  difference  between  CFG  and  CFG  based  RL  method 
is  mostly  due  to  the  adaptive  properties  of  the  RL  crawler. 

There  are  two  site  types  that  have  been  investigated.  The  first  site  type  is 
the  neutral  site  that  has  been  described  before.  The  other  site  was  a  mail  server 
on  conferences.  Also,  for  some  examples  there  are  runs  separated  by  one  month 
(March,  2001).  A  large  number  of  summer  conferences  made  announcements 
during  this  month. 

First,  let  us  examine  the  initial  phase  of  the  search.  This  initial  phase  of  the 
search  (the  first  200  downloaded  documents)  is  shown  in  Fig.  (18).  According 
to  this  figure  downloading  is  very  efficient  from  the  mail  server  site  in  each  oc¬ 
casion.  The  (non-adapting)  CFG  crawler  utilizing  averaged  weights  is  superior 
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Figure  16.  Search  pattern  for  CFC  and  reinforcement 
learning 

Search  was  launched  from  neutral  site.  Diameter  of  open  cir¬ 
cles  is  proportional  to  the  number  of  target  documents  down¬ 
loaded.  Edges  are  color  coded.  There  are  two  extremes.  Dark 
blue:  Site  was  visited  at  the  early  stage  during  the  search. 
Light  blue:  Recently  visited  site. 


Comparisons  between  Crawlers 


Figure  17.  Results  of  breadth  first,  CFC  and  CFC 
based  RL  methods. 


to  all  the  other  crawlers  —  almost  all  downloaded  documents  are  hits.  Close 
to  this  site  there  are  many  relevant  documents  and  the  ‘breadth  first  crawler’ 
is  also  efficient  here.  Nevertheless,  the  CFC  crawler  outperforms  the  breadth 
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Need  for  Adaptation 


FIGURE  18.  Comparisons  between  ‘neutral’  and 
mail  server  sites  in  the  initial  phase.  Reward  and 
punishment  are  given  in  the  legend  of  the  figure.  Dif¬ 
ferences  between  similar  types  are  due  to  differences 
in  launching  time.  The  largest  time  difference  be¬ 
tween  similar  types  is  one  month.  Neutral  site  (thin 
lines):  http://www.inf.elte.hu.  Mail  list  (thick  lines): 
http : //www . newcast le . research . ec . org/ cabernet/events/msg00043 . html. 
Search  with  ‘no  adaptation’  (dotted  line)  was  launched  from 
mail  list  and  used  average  weights  from  another  search  that 
was  launched  from  the  same  place. 


first  crawler  in  this  domain.  Launching  from  neutral  sites  is  inefficient  at  this 
early  phase.  Breadth  first  method  finds  no  hit  close  to  the  neutral  site  (not 
shown  in  the  figure). 

Middle  phase  of  the  search  is  shown  in  Fig.  (19).  Performance  in  the  middle 
phase  is  somewhat  different.  Sometimes,  launches  from  the  neutral  site  can 
find  excellent  regions.  The  CFG  crawler  is  still  competitive  if  launched  from 
the  mail  server.  Launches  from  the  mail  list  spanning  one  month  looked  similar 
to  each  other;  conference  announcements  barely  modified  the  results. 

Search  results  up  to  20,000  documents  are  shown  in  Fig.  (20).  This  graph 
contains  results  from  a  subset  of  the  runs  that  we  have  executed.  These  runs 
were  launched  from  different  sites;  the  neutral  site  and  the  mail  list,  as  well  as  a 
third  type,  the  ‘conference’  site:  http://  www.informatik.uni-freiburg.de/ 
index.en.html.  This  latter  is  known  to  be  involved  in  organizing  conferences. 
Adapting  RL  crawlers  collected  a  large  number  of  documents  from  all  site  types 
and  during  the  whole  (one  month)  time  region.  The  rate  of  collection  was  be¬ 
tween  2%-5%.  In  contrast,  although  the  collection  rate  is  close  to  100%  for 
the  CFG  launched  from  the  mail  list  site  up  to  200  downloads,  the  lack  of 
adaptation  prohibits  this  crawler  to  find  new  target  documents  in  circa  17,000 
downloads  at  later  stages.  Taken  together: 


PRINCIPAL  INVESTIGATOR:  DR.  HABIL.  ANDRAS  LORINCZ 

Need  for  Adaptation 


FIGURE  19.  Comparisons  between  ‘neutral’  and  mail 
server  sites  up  to  2000  documents.  Same  conditions  as 
in  Fig.  (18) 


Number  of  Investigated  Documents 

FIGURE  20.  Comparisons  between  different  sites  up  to 
20,000  documents.  Same  conditions  as  in  Figs.  (18)  and 
(19).  Search  with  ‘no  adaptation’  used  average  weights  from 
another  search  that  was  launched  from  the  same  place  (de¬ 
noted  by  *) 


(1)  Identical  launching  conditions  may  give  rise  to  very  different  results 
one  month  later. 

(2)  Starting  from  a  neutral  site  can  be  as  effective  as  starting  from  a  mailing 
list  for  the  adaptive  RL  crawler. 
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Weight  Adaptation 


Tuning  Steps  during  Crawling 


Figure  21.  Change  of  weights  of  SVMs  upon  down¬ 
loads  from  mail  site. 

Horizontal  axis:  Occasions  when  weights  were  trained. 


(3)  The  lack  of  adaptation  is  a  serious  drawback  even  if  the  crawler  is 
launched  from  a  mailing  list. 

The  importance  of  adaptation  is  also  demonstrated  by  the  RL  weights  as¬ 
signed  during  search.  These  weights  are  shown  in  the  following  figures.  Figure 
(21)  depicts  the  weights  belonging  to  the  different  SVMs  launched  from  the 
mail  list  site.  At  the  beginning  of  the  search  the  weights  are  almost  perfectly 
ordered;  the  largest  weight  is  given  to  the  SVM  that  predicts  relevant  doc¬ 
ument  ‘one  step  away’  whereas  the  4th  and  the  5th  SVMs  have  the  smallest 
weights.  That  is,  RL  ‘pays  attention’  to  the  first  SVM  and  pays  less  attention 
to  the  others.  This  order  changes  as  time  goes  on.  There  are  regions  (at  around 
tuning  step  number  1700  on  the  horizontal  axis)  where  most  attention  is  paid 
to  the  5th  SVM  and  smaller  attention  is  paid  to  the  others.  This  means  that 
the  crawler  will  move  away  from  the  region.  The  order  of  importance  changes 
again  when  a  rich  region  is  found;  the  importance  of  the  first  SVM  recovers 
quickly  and,  in  turn,  crawling  is  dominated  by  the  weight  of  the  first  SVM: 
The  crawler  ‘stays’  and  downloads  documents. 

‘Weight  history’  is  different  at  the  neutral  site  (Fig.  (22)).  Up  to  about  100 
downloads  very  few  relevant  documents  were  found  at  this  site.  The  value  of 
weight  of  the  5th  SVM  is  slightly  positive,  whereas  the  values  of  the  others  are 
negative.  The  1st  and  the  2nd  SVMs  are  weighted  the  ‘worst’;  weights  belonging 
to  these  classifiers  are  large  negative  numbers.  At  this  site,  the  order  of  SVMs 
that  were  trained  at  around  target  documents  is  not  appropriate.  Situation 
changes  quickly  when  a  rich  region  is  found.  In  such  regions  the  1st  SVM  takes 
the  lead.  It  is  typical  that  the  weight  of  the  5th  SVM  is  ranked  second.  That  is, 
the  adaptation  concerns  mostly  whether  the  crawler  should  stay  or  if  it  should 
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Weight  Adaptation 


Tuning  Steps  during  Crawling 


Figure  22.  Change  of  weights  of  SVMs  in  value  esti¬ 
mation  for  ‘neutral’  site. 

Horizontal  axis:  Occasions  when  value  estimation  was  erro¬ 
neous  and  weights  were  trained. 

move  ‘far  away’.  In  turn,  information  contained  by  the  ‘context’  is  relevant 
and  can  be  used  to  optimize  the  behavior  of  the  crawler. 

6.6.  Conclusions.  We  have  suggested  a  novel  method  for  web  search.  The 
method  makes  use  of  combinations  of  two  popular  AI  techniques,  support  vec¬ 
tor  machines  (SVM)  and  reinforcement  learning  (RL).  The  method  has  a  few 
adapting  parameters  that  can  be  optimized  during  the  search.  This  parame¬ 
terization  helps  the  crawler  to  adapt  to  different  parts  of  the  web.  The  out¬ 
puts  of  the  SVMs,  together,  formed  a  set  of  ‘yardsticks’  for  the  estimation 
of  the  distance  from  target  documents.  The  value  (the  weight)  of  the  differ¬ 
ent  yardsticks  may  be  very  different  at  different  neighborhoods.  The  point  is 
that  (i)  RL  is  efficient  with  good  features  (the  as  k-step  SVMs  in  this  case), 
(ii)  if  there  are  just  a  few  parameters  for  RL  then  these  parameters  can  be 
trained  quickly  by  rewarding  for  target  documents.  RL  has  many  different 
formulations  all  of  which  could  be  applied  here.  Most  promising  are  the  ap¬ 
proaches  that  can  take  into  account  (many)  different  criteria  in  the  search  ob¬ 
jective  [Fraser  and  Hauge,  1998,  Gabor  et  ah,  1998,  Dubois  et  ah,  2000].  Alas, 
RL  methods  are  capable  of  extracting  features  [Thrun  and  Schwartz,  1995]  that 
may  complement  the  prewired  SVM  features. 
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Abstract.  The  first  task  is  to  discover  and  characterize  the  substrate  of 
distributed  computation,  which  is  the  Internet  in  our  case.  The  second 
task  is  to  develop  methods  (models)  for  collaborative  interactions. 

According  to  recent  discoveries,  the  Internet  has  a  special  structure, 
called  scale-free  small  world.  Up  to  some  extent,  the  structure  of  Internet 
is  described  by  different  models.  In  a  parallel  project  we  have  introduced 
a  novel  model,  the  HebbNet,  which  represent  competitive-collaborative 
activities  in  a  more  sensible  way  than  existing  other  models  of  the  Inter¬ 
net.  The  particular  properties  of  the  Internet  and  HebbNets  are  described 
here.  Results  are  updated  as  compared  to  Report  No.  3. 

Second,  we  have  downloaded  a  portion  of  the  Internet.  The  work¬ 
ing  and  the  gain  of  efficiency  using  competitive-collaborative  model  were 
studied  on  this  downloaded  part.  A  model  for  injecting  novel  information 
into  the  downloaded  portion  of  the  Internet  was  developed.  Computer 
simulation  demonstrate  the  idea:  (i)  competition  gives  rise  to  a  division 
of  work,  the  division  of  searched  portions  of  the  reinforced  robotic  In¬ 
ternet  agents,  (ii)  the  division  of  work  gives  rise  to  specialization  and 
increased  efficiency  for  the  specialized  robotic  agents. 

Third,  we  have  executed  experiments  on  the  Internet  using  our  com¬ 
petition  based  collaborative  system.  The  experiments  are  in  full  support 
to  our  previous  efforts:  Crawlers  make  compartments  and  work  more  ef¬ 
ficiently  by  searching  only  these  compartments.  Collaboration  occurs  by 
the  division  of  work.  Two  summarizing  figures  are  shown  at  the  beginning 
of  this  report. 

The  conclusion  section  (Section  5)  contains  information  about 
necessary  future  steps,  which  are  not  covered  by  the  present 
project  and  are  being  at  basic  research  phase  at  this  very  mo¬ 
ment. 

The  Report  is  extended  by  Appendices: 

(1)  Description  of  concepts  and  methods,  like  reinforcement  learning, 
text  classification,  SVMs  are  appended  to  the  Report  are  provided 
in  the  Appendix 

(2)  A  large  body  of  experiments  on  HebbNets  are  provided  in  an  ac¬ 
companying  technical  report,  titled:  Meta  level  analysis  of  Hebbian 
evolving  networks  http://people.inf.elte.hu/lorincz/Files/ 
NIPG-ELU-21-04-2002.pdf 
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SUMMARY  IN  TWO  FIGURES 


Novel  documents  on  CNNMoney 


FIGURE  1.  Novel  documents  found  by  two  competing  crawlers 

Two  competing  crawlers  are  searching  the  CNN  Money  site  for  novel  infor¬ 
mation  (information  not  yet  known  to  the  crawlers).  During  this  search  only 
the  crawler  which  delivers  the  information  first,  is  reinforced.  The  red  crawler 
starts  better,  the  green  crawler  is  following  it  in  most  of  the  time  (there  is 
no  reward  to  this  crawler).  Suddenly,  at  around  5000th  download  the  green 
crawler  finds  other  routes  to  follow.  This  new  routes  turn  out  to  be  rewarding, 
the  efficiency  of  the  green  crawler  is  about  the  same  as  that  of  the  red  crawler. 
Both  crawlers  are  inefficient  between  downloads  15,000  and  30,000,  this  is  the 
weekend.  After  the  weekend,  the  novel  routes  found  by  the  green  crawler  are 
more  efficient.  The  number  of  novel  documents  is  much  larger  at  the  start  of 
the  search  when  all  documents  are  novel  for  the  competing  crawlers.  This  can 
be  seen  by  considering  the  slope  of  the  curves  at  the  beginning  of  the  search. 
For  more  details,  see  report. 
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Date  of  "novel"  documents 


FIGURE  2.  Age  of  documents  found  by  competing  crawlers 

Two  competing  crawlers  are  searching  the  CNN  Money  site  for  novel  infor¬ 
mation  (information  not  yet  known  to  the  crawlers).  In  about  half  the  cases  we 
could  establish  the  date  of  the  document.  Red  (green)  dots:  document  down¬ 
loaded  by  the  red  (green)  crawler.  Blue  dots:  documents  downloaded  by  both 
crawlers.  After  four  days,  there  are  still  old  documents  find  by  the  crawlers, 
but  the  relative  rate  of  novel  documents  is  increasing  strongly  after  the  start 
of  the  new  week  (at  around  reward  number  2,700). 

The  number  of  novel  documents  is  much  larger  at  the  start  of  the  search 
when  all  documents  are  novel  for  the  competing  crawlers.  Note  that  365  units 
on  the  vertical  scale  correspond  to  a  year.  The  crawlers  collect  the  daily  doc¬ 
uments, too,  as  it  can  is  shown  by  the  slight  slope  of  maxima  (the  convex  hull 
of  the  maxima)  of  the  points.  For  more  details,  see  report. 
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Red,  green  and  blue  dots:  links  followed  by  red,  green  and  both  crawlers 
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Search  was  launched  from  neutral  site.  Diameter  of  open  circles  is 
proportional  to  the  number  of  target,  documents  downloaded.  Edges 
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are  color  coded.  There  are  two  extremes.  Dark  blue:  Site  was  visited 
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1.  Introduction 

Cooperation  requires  some  knowledge  about  the  ‘arena’  of  cooperation.  In 
our  case,  this  arena  is  the  Internet.  Interestingly,  recent  discoveries  demon¬ 
strated  that  a  large  variety  of  networks,  such  as  social  networks,  scientific 
collaborative  networks,  hardware  networks  at  the  level  of  proxies,  Internet  con¬ 
nections,  html  links,  etc.,  share  the  same  structure.  Moreover,  the  suspicion 
has  been  increasing  that  all  (or  at  least  a  large  number  of)  ‘sustained’  systems 
share  the  same  common  connectivity  property,  provided  that  the  underlying 
concept  of  connectivity  is  discovered.  Two  concepts  have  to  be  explained  here: 

•  the  concept  of  a  sustained  system  and 

•  the  word  connectivity. 
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A  system  is  called  sustained  (i)  if  it  is  not  closed  from  the  environment  and  (ii) 
if  it  is  receiving  energy  from  the  environment.  A  cell  of  the  body,  the  earth  of 
the  solar  system,  the  research  group  of  the  University  are  such  examples. 

Connectivity  is  a  loosely  defined  word  to  describe  interaction  amongst 
collaborative-competitive  units.  It  is  a  loose  term,  because  in  its  crudest  form 
is  not  concerned  with  the  vehemence,  but  only  with  the  existence  of  the  inter¬ 
action.  The  usage  of  this  word  is  as  follows:  If  two  units  are  engaged  in  kind 
of  interaction  then  they  are  connected.  Interaction  can  be  directed  or  could  be 
mutual.  For  example,  (mostly)  directed  interaction  occurs  between  the  wind 
and  the  sailboat.  Interactions  are  never  fully  directed;  the  sailboat  -  although 
up  to  a  minor  extent,  but  -  influences  the  local  properties  of  the  wind.  An  ex¬ 
ample  for  a  more  balanced  interaction  is  the  teacher  student  relation,  whereas 
an  almost  fully  balanced  may  occur  for  some  ants.  Given  the  t,ype(s)  of  interac¬ 
tion,  one  ends  up  with  a  directed  or  with  an  undirected  network  of  interacting 
units. 

Report  organization. 

(1)  First,  novel  concepts  of  the  Internet  are  provided.  This  part  of  the 
report  contains  the  description  of  our  model  of  the  Internet,  which  we 
call  HebbNets. 

(2)  The  second  part  contains  the  results  of  our  competitive-collaborative 
model.  Specialization  of  Internet  agents,  division  of  work  and  the  in¬ 
crease  of  efficiency  are  demonstrated  on  a  downloaded  part  of  the  In¬ 
ternet. 

(3)  Discussions  can  be  found  and  conclusions  are  drawn  in  Sections  4  and 
5,  respectively. 

(4)  Details  about  intelligent  crawlers  are  contained  by  the  Appendix. 

(5)  A  large  body  of  experiments  on  HebbNets  are  provided  in  an  accompa¬ 
nying  technical  report,  titled:  Meta  level  analysis  of  Hebbian  evolving 

networks  http: //people . inf . elte.hu/lorincz/Files/NIPG-ELU-21-04-2002 . pdf 

2.  Internet  AxND  other  networks 

2.1.  Short  overview.  Interactive  systems  with  network  structure  have  re¬ 
cently  become  a  fascinating  area  of  research  interest.  Dynamic  systems  that 
govern  the  formation  of  these  networks  are  of  central  importance  because  of 
the  intriguing  similarities  among  many  biological,  social,  and  information  pro¬ 
cessing  networks.  The  original  model  of  Watts  and  Strogatz  [D.  .1.  Watts  and 
S.  H.  Strogatz,  Nature  393,  440,  (1998)]  explore  random  restructuring  of  links 
among  a  finite  number  of  ‘nodes’.  Other  works  have  dealt  with  growing  struc¬ 
tures  and  optimization  of  link  structure  of  finite  systems.  In  this  paper  we 
present  and  study  the  ‘HebbNets’,  networks  in  which  structural  changes  are 
governed  by  Hebbian  learning  rules.  We  find  that  Hebbian  learning  is  also 
capable  to  develop  all  kinds  of  network  structures,  including  small-world  and 
scale-free  networks.  Our  results  may  support  the  idea  of  Edelman  [G.  M.  Edel- 
man:  Neural  Darwinism,  New  York,  Basic  Book  (1987)]  that,  the  development 
of  central  nervous  system  may  have  evolutionary  components. 

2.2.  Introduction.  The  last  few  years  have  witnessed  the  evolution  of  novel 
and  efficient  ways  of  describing  complex  interactive  systems  (CISs).  The  novel 
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description  of  a  CIS  is  based  on  graphs  with  nodes  and  (directed)  edges,  rep¬ 
resenting  constituents  of  the  system  and  the  interactions  among  them.  Classi¬ 
fication  of  CISs  is  based  on  the  statistical  properties  of  the  network.  Similar 
network  structures  may  be  found  in  many  different  fields.  Both  these  sys¬ 
tems  and  the  corresponding  dynamical  models  which  define  the  formation  of 
these  networks  may  be  of  fundamental  importance  to  understand  the  behaviors 
of  CISs.  The  interest  in  CISs  is  boosted  by  the  intriguing  similarities  of  bio¬ 
logical,  social,  and  information  processing  networks  [Watts  and  Strogatz,  1998, 
Kleinberg,  1998,  Albert  et  al.,  1999,  Barabasi  and  Albert,  1999,  Barabasi  et  ah,  2000, 
Marchioria  and  Latorac,  2000,  Latora  and  Marchiori,  2001,  Bohland  and  Minai,  2001, 
i  Cancho  and  Sole,  2001,  Albert  and  Barabasi,  2002].  The  original  model  of 
the  the  World  Wide  Web  (WWW)  by  Watts  and  Strogatz  [Watts  and  Strogatz,  1998] 
explored  random  restructuring  of  the  links  among  a  finite  number  of  ‘nodes’. 
Barabasi  and  his  colleagues  introduced  the  concept  of  preferential  attach¬ 
ment  to  provide  a  more  sophisticated  model  of  WWW  [Albert  et  al.,  1999, 
Barabasi  and  Albert,  1999].  The  idea  has  been  extended  to  other  types  of  net¬ 
works  [Barabasi  et  al.,  2000].  Another  approach  deals  with  optimization  of  link 
structure  of  finite  systems  [i  Cancho  and  Sole,  2001]. 

Probably  the  most  complex  network  is  inside  us:  the  most  exciting  proper¬ 
ties  of  our  brain  have  a  lot  to  do  with  the  special  connection  system  among 
its  units.  Although  our  knowledge  on  the  building  blocks  is  increasing,  we  are 
still  far  from  a  complete  understanding  of  the  structure  and  research  of  the 
central  nervous  system  (CNS)  is  primarily  targeting  at  these  questions.  At  the 
same  time,  many  similar  questions  on  the  architecture  of  the  WWW  arise.  The 
recognition  of  the  parallel  nature  resulted  in  the  intuitive  idea  to  highlight  the 
similarity  between  these  descent  phenomena.  From  one  hand,  it  has  been  sug¬ 
gested  to  use  the  mutual  activity  correlation  (that  is  the  original  form  of  Heb- 
bian  learning)  in  modeling  organizational  learning  [Kulkarni  et  al.,  2000].  On 
the  other  hand,  similar  structural  characteristics  of  the  web  and  the  single  com¬ 
pletely  described  nervous  system  of  the  nematode  C.  elegans  has  been  reported 
[Watts  and  Strogatz,  1998].  In  this  paper  we  investigate  the  question  whether 
an  evolving  network,  governed  by  Hebbian  rule,  has  the  same  or  similar  proper¬ 
ties  as  found  by  studying  the  web  or  social  networks.  The  question  is  of  central 
importance:  in  seeking  the  answer  we  hope  to  find  general  underlying  princi¬ 
ples,  which  give  rise  to  small- world  like  structures  in  cooperative-competitive 
systems. 

It  may  be  important  to  note  here  that  concepts  of  Hebbian  learning  have  un¬ 
dergone  revolutionary  changes  in  the  last  few  years.  The  original  suggestion  of 
Hebb  [Hebb,  1943]  has  been  modified  by  recent  findings  [Markram  et  al.,  1997, 

Magee  and  Johnston,  1997,  Bell  et  al.,  1997].  (For  a  review,  see,  e.g.  the  work 
of  Abbott  and  Nelson  [Abbott  and  Nelson,  2000]).  The  novel  concept  is  called 
spike-time  dependent  synaptic  plasticity  (STDP).  The  underlying  mathemat¬ 
ical  concepts  could  correspond  to  linear  and  non-linear  versions  of  principal 
component  analysis  [Oja,  1982,  Abbott  and  Nelson,  2000]. 


2.3.  Description  of  HebbNet.  We  intended  to  construct  a  model  network 
(HebbNet  in  short)  in  which  the  structural  changes  are  governed  by  Hebbian 
rules  and  the  interaction  with  the  environment  and  all  the  interacting  elements 
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of  the  network  are  as  simple  as  possible.  We  assume  that  the  network  is  sus¬ 
tained,  by  inputs  with  no  spatio-temporal  structure;  the  input  is  random  noise. 
Our  models  consist  of  N  number  of  simplified  integrate-and-fire  like  ‘neurons’ 
or  nodes.  The  dynamics  of  the  internal  activity  is  written  as 


(1) 


A  a, 

~At 


=  £ 


WijClj  +  X, 


(ext) 


for  i  =  1,2, ...  ,N.  (N  was  200  in  our  simulations.)  Variable  x^ext^  £  (0,  l)N 
denotes  the  randomly  generated  input  from  the  environment,  at  is  the  internal 
activity  of  neuron  i.  Wij  is  ijth  element  of  matrix  W,  i.e.,  the  connection 
strength  from  neuron  j  to  neuron  i.  If  At  =  1  then  we  have  a  discrete-time 
network  and  each  parameter  has  a  time  index,  or  if  At  is  infinitesimally  small 
then  Eq.  1  becomes  a  set  of  coupled  differential  equations.  The  neuron  j 
outputs  a  spike  (neuron  j  fires)  when  aj  exceeds  a  certain  level,  the  threshold 
parameter  6.  Spiking  means  that  the  output  of  the  neuron  a)  (superscript 
s  stands  for  ’spiking’)  is  set  to  1.  Otherwise,  a)  =  0.  Amount  of  excitation 
received  by  neuron  i  from  neuron  j  is  Wija®  when  neuron  j  fires.  After  firing,  a,j 
is  set  to  zero  at  the  next  time  step.  For  continuous  case  cij  is  set  to  zero  after  a 
very  small  time  interval.  Equation  1  describes  the  simplest  form  of  ‘integrate- 
and-fire’  network  models  which  is  still  plausible  from  a  neurobiological  point 
of  view.  No  temporal  integration  occurs  for  the  discrete  case  provided  that 
the  left  hand  side  of  Eq.  1  is  replaced  by  a f  where  superscript  +  denotes  time 
shifting.  In  this  limiting  case,  and  if  the  threshold  is  high  enough,  ‘binary 
neurons’  emerge.  This  model  resembles  the  original  model  of  McCullough  and 
Pitts  [McCullough  and  Pitts,  1943]. 

We  examined  the  effect  of  local  activity  threshold  and  global  activity  con¬ 
straint,  (selection  of  a  given  percent  of  nodes  with  the  highest  activity).  The 
former  one  is  more  realistic  biologically,  while  the  latter  one  is  more  convenient: 
in  this  way  the  ratio  of  active  units  is  always  known  and  fixed.  For  these  two 
cases,  computer  simulations  showed  negligible  differences.  Synaptic  strengths 
were  modified  as  follows: 


(2) 


A 

At 


£  K(tj  -ti)alusa/ 
( u,tj ) 


where  K  is  a  kernel  function  which  defines  the  influence  of  the  temporal  activity 
correlation  on  synaptic  efficacy  and  Awtj  / At  may  be  taken  over  discrete  or  over 
infinitesimally  small  time  intervals.  Possible  examples  are  depicted  in  Fig.  3. 
The  kernel  is  a  function  of  the  time  differences.  When  the  input  is  made  of 
noise,  as  in  our  studies,  only  the  ratio  of  the  positive  (strengthening)  and  the 
negative  (weakening)  parts  of  the  kernel  function  should  count.  This  is  the 
result  of  the  lack  of  temporal  correlations  in  the  input.  Temporal  grouping  and 
reshaping  of  the  kernel  would  not  modify  our  results  as  long  as  the  said  ratio 
is  kept  constant.  In  turn,  our  results  concern  both  types  of  kernels  depicted  in 
Fig.  3. 

In  the  first  place,  we  have  been  interested  in  the  emerging  local  and  global 
connectivity  structure  of  W.  Instead  of  using  global  structural  property  (L, 
characteristic  path  length  which  is  the  average  number  of  edges  on  the  short¬ 
est  path)  and  the  clustering  coefficient  ( C )  proposed  by  Watts  and  Strogatz 
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Kernel  amplitude 


Figure  3.  Kernel  functions 

Two  temporal  kernels  as  a  function  of  time  difference  between 
spiking  time  of  neuron  i  and  j  ( f*  —  tj).  Relevant  parameter 
of  the  shape  for  noise-sustained  systems  is  the  ratio  ( ta+/a -) 
of  the  areas/sums  of  positive  and  negative  parts /components 
of  the  kernel,  A+  and  A~ ,  respectively  (r^+fA-  =  A+/A~). 


[Watts  and  Strogat.z,  1998]  we  applied  the  so  called  connectivity  length  mea¬ 
sure  based  on  the  concept  of  network  efficiency  [Latora  and  Marchiori,  2001]. 

This  single  measure  is  more  appropriate  for  weighted  networks  [Marchioria  and  Latorac,  2000], 
equally  well  applicable  for  describing  global  and  local  properties  and  offers  a 
unified  theoretical  background  to  characterize  our  system.  According  to  the 
definition  [Marchioria  and  Latorac,  2000,  Bohland  and  Minai,  2001],  local  effi¬ 
ciency  between  nodes  i  and  j  in  a  weighted  network  with  connectivity  matrix  W 
is  eij  =  1  /dij,  where  djj  =  minn)  kx,...kn  (l /wij,  1  /wikl  +  .  ■ .  +  1  lwkn_xkn  +  1  /™knj) 

(krn  €  (1,2,...  N)  for  every  1  <  m  <  N  —  1  and  1  <  n  <  N).  For  graphs  with 
connection  strengths  of  values  0  or  1,  dij  corresponds  to  the  shortest  distance 
between  nodes  i  and  j.  The  average  of  these  values  (E[dij]  =  N(n-i)  Hi ^j  eij) 
characterizes  the  efficiency  of  the  whole  network.  The  local  harmonic  mean 
distance  for  node  i  is  defined  as 

n  bl 

(3)  Dh{i)  =  = - -, 

Z~ij:wij>0 

where  n ^  is  the  number  of  neurons  around  neuron  i  with  >  0.  In  terms  of 
efficiency,  this  inverse  of  this  value  describes  how  good  the  local  communication 
is  amongst  the  first,  neighbors  of  node  i  with  node  i  removed.  It  is  a  measure 
of  the  fault  tolerance  of  the  system.  The  mean  global  distance  in  the  network 
is  defined  by  the  following  quantity: 


(4) 


Dh  = 


N(N  -  1) 

Hi,j  eij 


Global  distance  provides  a  measure  for  the  size  (or  the  diameter)  of  the  net¬ 
work,  which  influences  the  average  time  of  information  transfer.  According 
to  [Marchioria  and  Latorac,  2000,  Bohland  and  Minai,  2001]  local  harmonic 
mean  distance  measure  behaves  like  1  /C  (inverse  of  the  clustering  coefficient) , 


EOARD-NIPG-ELU-18-SEPT-2002 


13 


whereas  the  global  value  corresponds  to  L.  It  can  be  shown  that  L  is  a  good 
approximation  of  Dh  (or  1  /L  for  the  global  efficiency)  under  certain  conditions 
[Bohland  and  Minai,  2001]. 

These  connectivity  length  measures  allowed  us  to  study  the  emerging  net¬ 
work  structures  as  the  function  of  the  following  parameters:  (i)  the  magnitude 
of  the  external  excitation  (defined  by  the  average  percentage  of  neurons  re¬ 
ceiving  excitation  from  the  environment  and  (ii)  the  strengthening-weakening 
area  ratio  of  the  kernel,  K.  The  binary  neuron  model  was  also  investigated. 
Figures  4  and  5  summarize  our  findings  in  different  parameter  regions.  The 
figure  displays  the  appearance  of  scale  free  nets  as  a  function  of  the  excitation 
level  and  rA+  /A~.  The  length  of  the  scale-free  regions  was  determined  by  first, 
plotting  the  distribution  of  the  sum  of  the  weights  of  outgoing  connections  (av¬ 
eraged  over  10000  samples  taken  from  20  networks)  for  every  parameter  set 
studied.  Results  were  depicted  on  loglog  plot.  Supposing  a  power-law  dis¬ 
tribution  ( P(k *)  «  where  k*  denotes  the  discretized  values  of  the 

connection  strength) ,  a  linear  fitting  was  made  to  approximate  7.  The  width  of 
the  scale-free  region  was  estimated  by  the  length  of  the  region  with  power-law 
distribution  relative  to  the  full  length  covered  on  the  log  scale.  Maximum  error 
of  the  linear  fit  was  set  to  10-3  STD.  That  is,  for  100  discretization  points,  the 
width  of  a  region  spreading  an  order  of  magnitude  on  the  loglog  plot  is  equal 
to  0.5. 

Fig.  6  displays  the  emerging  connections  of  a  HebbNet  for  two  different 
parameter  sets.  We  compared  the  resulting  HebbNet  structures  with  a  random 
net,  in  which  the  same  weights  of  the  dynamic  network  have  been  randomly 
assigned  to  different  node  pairs.  The  two  inlets  show  the  HebbNet  connection 
matrices.  While  inlet  (A)  belonging  to  case  (c)  in  Fig.  4  resembles  a  random 
connection  matrix,  inlet  (B)  belonging  to  case  (d)  in  Fig.  4  represents  a  sparse 
structure.  (Note  that  most  elements  are  not  zero,  but  very  small.) 

Fig.  6  highlights  clearly  the  emerging  small- world  properties,  i.e.,  small  local 
connectivity  values  (high  clustering  coefficients)  for  case  (d).  Although  the 
global  connectivity  length  w7as  almost  the  same  for  all  HebbNets  and  their 
corresponding  random  nets,  local  distances  are  much  smaller  in  case  (d).  That, 
is,  connectivity  structure  is  sparse  but,  information  flow  is  still  fault  tolerant 
and  efficient. 

The  robustness  of  the  network  to  the  external  excitation  is  illustrated  on  the 
next,  figure.  By  increasing  the  excitation  level,  the  average  local  connectivity 
length  of  the  random  net,  is  drastically  increasing,  whereas  the  efficiency  of 
the  small-world  network  does  not,  change  too  much  in  the  same  region.  For 
the  network  with  parameters  rA+/A-  =  0.1  (Fig.  7(A)),  there  is  a  sharp  cut-off 
around  excittion  level  0.55,  where  local  distances  suddenly  drop,  due  to  the  high 
ratio  of  excitation.  Qualitatively  similar  behavior  can  be  seen  for  rA+  /A-  =  0.6 
(Fig.  7(B)),  but,  the  cut-off  is  around  rex  =  0.9. 

For  networks  with  significant  interaction  we  have  experienced  a  convergence 
of  the  exponent  of  the  power-law7  distribution  to  -1.  The  width  of  the  scale-free 
region  was  relatively  broad  (see,  Fig.  8). 

I11  summary,  we  have  demonstrated  that  small-world  architectures  with 
scale-free  domains  may  emerge  in  sustained  networks  under  STDP  Hebbian 
learning  rule  without  any  other  specific  constraints  on  the  evolution  of  the  net. 
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FIGURE  4.  Scale- free  region  with  negligible  interaction 
Left:  exponent  of  the  power  law ,  right:  relative  percentage 
of  the  power-law  domain  as  a  function  of  r^+  /a-  and  rex  (the 
ratio  of  excited  neurons).  Contribution  of  other  neurons  to  the 
neuronal  inputs  is  negligibly  small.  Difference  between  binary 
and  integrate-and-fire  neurons  disappears  in  this  limiting  case. 
Results  are  averaged  over  20  runs,  all  sampled  50  times,  6  = 
0.5.  Stripes  denote  unstable  region:  components  of  matrix  W 
may  vanish.  Log-log  plots  corresponding  to  points  (a)-(d)  are 
shown  in  Fig.  5.  Power-law  with  negative  (positive)  exponent: 
cases  (a)  and  (d)  (case  (c)).  Positive  exponents  are  thresholded 
to  zero  on  the  figure. 


The  role  of  noise  in  the  central  nervous  system  [Ferster,  1996,  Miller  and  Troyer,  2002] 
is  unclear.  The  existence  of  such  ‘HebbNets’  may  support  the  speculative  view 
of  Kandel  et  al.  [Kandel  and  O’Dell,  1992]  that  structural  development  and 
learning  plasticity  in  CNS  may  have  a  common  basis.  According  to  our  results, 
evolution  and  plasticity  of  the  networks  may  be  maintained  by  noise  randomly 
generated  within  the  CNS.  We  conjecture  that  the  sustained  nature  of  noise  and 
the  competition  imposed  by  small  Ta+ /a-  values  are  the  two  relevant  compo¬ 
nents  of  plasticity  and  learning.  It  might  be  equally  important  that  exponents 
of  HebbNets  with  significant  interaction  amongst  neurons  are  similar  in  a  broad 
range  of  parameters. 

As  far  as  other  evolving  networks  are  considered,  the  profound  implica¬ 
tion  of  our  result  is  that  local  (Hebbian)  learning  rules  may  be  sufficient, 
to  form  and  maintain  an  efficient  network  in  terms  of  information  flow. 

This  feature  differs  from  existing  models,  such  as  the  model  on  preferen¬ 
tial  attachment  [Barabasi  and  Albert,  1999],  the  global  optimization  scheme 
[i  Cancho  and  Sole,  2001],  and  also  from  the  original  Watts  and  Strogatz  model 
[Watts  and  Strogatz,  1998]. 
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(c)  (d) 

FIGURE  5.  Log-log  plots  for  different  parameters 

The  four  diagrams  display  typical  distributions  (P(k*))  for 
parameters  shown  in  Fig.  4  by  (a),  (b),  (c)  and  (d).  Cases  (a) 
and  (d)  are  arbitrary  examples  from  the  powder  law  region. 


3.  Competition,  which  Results  in  Collaboration 

Rapid  development  of  Internet  technologies  increases  the  use  of  this  unique 
medium  for  collaboration.  While  interoperability  is  a  main  focus  of  these  col¬ 
laborative  efforts,  privacy  protection,  along  with  reputation  management  is  in 
demand.  Recently  several  works  have  emerged  that,  focus  on  these  problems. 
However,  works  on  providing  confidentiality  and  reputation  are  limited.  For  ex¬ 
ample,  most  of  the  works  focus  on  a  particular  problem  and  problem  domain, 
say  reputation  management  for  e-commerce  applications.  Considerably  less 
work  has  been  done  to  accommodate  novel  requirements,  such  as  collaborative 
access  requirements  and  developing  protocols  that  allow  anonymity  while  sup¬ 
port  accountability.  Furthermore,  most  of  the  existing  works  were  developed 
with  special  application  in  mind  and  are  not  generally  applicable. 


3.1.  Consequences  of  Novel  Findings  about  the  Internet.  We  assume  - 
without  limiting  generality  -  that  the  goal  of  the  Internet  is  to  publish.  Pub¬ 
lication  is  seen  here  as  a  general  way  of  accomplishing  a  task  and  making  it 
available  for  others  in  a  understandable  (readable)  form.  In  turn,  the  goal 
of  collaboration  on  the  Internet  can  be  seen  as  editing  to  reach  the  goal  in 
an  efficient  manner.  In  our  approach  we  take  into  account  the  general  fea¬ 
tures  discovered  over  the  last  few  years  about  publishing  and  collaboration. 
These  features  can  be  best  described  in  terms  of  networks.  Take  an  author 
(author  A)  and  his/her  co-authors.  Say,  author  B  is  a  co-author  of  author 
A.  This  is  a  minimal  network.  The  ’distance’  between  the  two  co-authors 
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Figure  6.  Harmonic  mean  distances 

Local  harmonic  mean  distances  in  ascending  order  are  shown. 
For  better  visualization  not  all  data  points  are  marked  and 
the  points  are  connected  with  a  solid  line.  Lines  with  upward 
triangle  markers:  STDP  learning.  Lines  with  circles:  same 
but  randomly  redistributed  weights.  Line  with  empty  (solid) 
markers:  HebbNet,  of  case  (c)  (case  (d)).  Global  harmonic 
mean  distances  for  the  original  and  for  the  randomized  net¬ 
works  in  case  (c)  of  Fig.  4  (case  (d)  of  Fig.  4)  are  about  the 
same  Dh  «  D £  ss  5.5  ( Dh  «  D'h  «  10).  The  two  inlets  show 
the  resulting  connection  matrices. 


is  1.  Alternatively,  there  is  a  link  between  authors  A  and  B.  B  has  co¬ 
authors,  too.  Author  C  is  a  co-author  author  B,  whereas  author  C  is  not 
a  co-author  of  author  A.  There  are  links  between  authors  A  and  B  and  au¬ 
thors  B  and  C.  However,  there  is  no  link  between  author  A  and  C.  The  dis¬ 
tance  between  authors  A  and  C  is  equal  2.  A  broad  range  of  social  networks 
shows  general  features  when  formulated  this  way  [Watts  and  Strogatz,  1998, 
Barabasi  and  Albert,  1999,  i  Cancho  and  Sole,  2001].  The  general  features  are 
called  ’small- world  phenomenon’  and  ’scale- free  networks’.  For  example,  simi¬ 
lar  networks  have  been  discovered 

(1)  amongst  Hollywood  actors  (collaboration  is  playing  in  the  same  movie), 

(2)  authorship  in  scientific  communications, 

(3)  social  networks  (’collaboration’  means  knowing  each  other),  and 

(4)  in  the  so  called  Erdos-point.  in  mathematics. 

One  should  design  security  architecture  for  originally  non-hierarchical  networks. 
Further,  one  may  allow  the  development  of  a  hierarchy  based  on  the  efficiency 
of  the  collaborating  partners.  An  example  is  the  Erdos  point,  where  there  is  a 
clear  hierarchical  structure  with  Paul  Erdos,  the  famous  mathematician  on  the 
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FIGURE  7.  Average  local  distance  vs.  excitation  ratio 

A:  va+/a-  =  0.1,  B:  r^+/A -  =  0-6-  Diamonds:  average 
local  distances  for  the  evolving  network.  Circles:  average 
local  distances  for  the  corresponding  random  net. 


top  of  that  hierarchy.  Our  model  is  called  ’Editorial  Board’  and  may  have  the 
following  ranks: 

•  reader 

•  author 

•  reviewer 

•  board  member 

•  editor 

Other  structures  are  possible,  including  specialization  in  the  form  of  action 
editors,  to  mention  a  simple  extension. 

Hierarchy  can  be  imposed  in  a  rigid  fashion.  One  corresponding  structure  is 

•  customer/user 

•  worker 

•  quality  assurance 

•  member  of  the  advisory  board 

•  CEO 

Note,  that,  the  role  of  the  reader  or  that  of  the  user  is  special:  they  provide 
the  reinforcing  feedback  of  the  external  world  by  purchasing  the  information 
or  the  product. 

At  this  point,  we  note  the  following.  Starting  from  efficiency  based  self¬ 
organizing  communities  will  favor  competition  as  a  derivative  within  and 
amongst  communities.  Competition  will  require  security  rules.  Security  rules 
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Figure  8.  Power-law  with  significant  interaction 
Left:  exponent  of  the  power  law ,  right:  relative  percentage 
of  the  power-law  domain  as  a  function  of  rex  and  excitation 
threshold  9.  rA+  /A~  =  0.1  Results  are  averaged  over  700  steps. 
Input  from  other  neurons  could  exceed  the  external  inputs  by 
a  factor  of  10.  The  exponent  of  the  power-law  approximates  -1 
for  broad  regions  of  9  and  rex.  Outside  this  region  the  network 
may  vanish  or  may  start  to  oscillate. 


can  be  imposed  at  two  different  levels:  (i)  at  the  level  of  the  self-organizing  com¬ 
munity,  and  (ii)  at  the  level  of  interaction  between  such  communities.  These 
concepts  are  novel,  so  we  give  an  example.  If  a  particular  self-organizing  com¬ 
munity  is  made  of  readers  and  authors  only,  then  the  collaboration  with  another 
community  with  more  strict  quality  assurance,  e.g.,  board  members,  action  ed¬ 
itors,  editor  will  be  limited.  Limitations  are  also  restricted  by  the  definition  of 
readers.  If  the  community  is  closed  then  special  rules  for  interaction  with  other 
communities  need  to  be  established.  If  anybody  can  be  a  reader,  then  self¬ 
organizing  collaborative  efforts  between  communities  may  emerge.  In  turn, 
some  communities  will  be  efficient  whereas  others  will  be  less  efficient.  The 
efficiency  is  governed  by  -  at  least  -  two  factors: 

(1)  The  efficiency  gained  by  collaboration 

(2)  The  slowing  down  of  project  organization  as  determined  by  the  time 
constraints  of  security  rules 

Note  that  we  aim  to  develop  concepts  for  collaborative  schemes  amongst  hu¬ 
mans  and  Internet  robots.  The  time  constraints  could  be  most  restrictive  for 
large  organizations. 

A  good  way  to  think  of  such  communities  is  to  consider 

(1)  the  participants  of  the  community  as  agents, 

(2)  the  community  as  an  assembly  of  agents, 

(3)  the  security  rules  of  the  community  as  the  language  of  the  community, 
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(4)  the  interaction  amongst  communities  as  communication  between  agents 
speaking  possibly  different  ‘languages’. 

Alike  to  real  languages,  there  will  be  a  competition  amongst  these  languages. 
This  competition  will  favor  languages,  which  are  simple  for  level  communication 
and  which  are  abundant.  The  formulation  of  the  S20S  as  agents  speaking 
languages  is  not  the  subject  of  this  proposal.  At  this  point,  one  may  think  to 
establishing  security  rules, 

(1)  which  meet  the  common  general  structure  of  self-organizing  human 
collaboration, 

(2)  which  allow  the  derivation  of  ’pre-wired’  hierarchies, 

(3)  which  allow  collaboration  amongst  communities, 

(4)  which  can  be  secure  if  required  by  the  founders  of  the  self-organizing 
community. 

The  proposed  model  consists  of  independent,  autonomous  sub-systems,  gov¬ 
erned  by  a  set  of  local  goals.  Dynamics  of  these  independent  subsystems  defines 
a  global,  self-organizing  model  without  the  necessity  of  central  authorization. 
The  aim  is  to  develop  a  system  architecture  that  supports  collaboration  while 
guarantees  information  confidentiality,  integrity,  and  availability.  In  addition, 
anonymity  of  the  initiator  and  responder  are  guaranteed  as  well  as  account¬ 
ability  that  is  used  to  deter  misuse  or  for  billing  services. 

Our  long-term  goals  are  to  provide  privacy  (hide  identity  of  initiator  and 
responder),  provide  mechanism  for  secure  online  collaboration  (protocols  what 
can  be  shared  between  the  parties,  policy  that  defines  who  is  allowed  to  ac¬ 
cess  what  and  wThat  level),  and  provide  accountability  (misuse  detection,  or 
billing  purposes.  Our  assumption  is,  that  users  may  form  ad-hoc  communities 
based  on  their  interest  and  expertise.  The  same  user  may  belong  to  different 
communities  at  different  roles  or  may  assume  different  roles  within  a  given 
community. 

In  the  editorial  board  example,  all  users  are  allowed  to  submit  papers  to 
be  reviewed,  and  all  users  are  allowed  to  review  papers.  Both  submitters  and 
reviewers  are  classified  according  to  the  quality  of  work  they  perform.  Initially 
all  users  and  all  papers  are  assigned  an  ‘initial’  impact  factor.  Based  on  the 
submitted  reviews,  the  impact  factor  of  each  publication  may  increase  or  de¬ 
creases.  In  addition,  reviewers’  impact  factor  will  be  changed  based  on  the 
response  to  their  comments,  i.e. ,  level  of  update  of  the  reviewed  paper  based 
on  the  evaluation. 

Additional  restrictions  are  involved  on  the  model  to  guarantee  security  and 
anonymity  while  preventing  frauds  by  providing  accountability.  For  example, 
a  user  may  not  be  able  to  change  his/her  impact  factor  by  login  in  as  new  user, 
submit  the  same  paper  to  get  a  better  evaluation  of  the  paper,  or  submit  high 
quality  reviews  of  his/her  own  work.  Although,  some  of  the  above  concepts 
have  been  independently  considered  by  researcher,  there  does  not  exist  a  com¬ 
plete  model  that  supports  both  security  and  anonymity  in  a  single,  dynamic 
environment,  while  preserving  accountability. 


3.2.  Self-Organizing  System  (SOS)  for  Online  Collaboration.  SOS  may 

be  distributed  and  may  have  human  as  well  as  robotic  participants.  In  order  to 
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demonstrate  the  generality  of  our  approach  a  strictly  robotic  example  is  pro¬ 
vided  here.  Access  control  issues,  and  some  experiments  about  their  efficiency 
are  provided  to  support  our  argumentation. 

An  Internet  robot  may  represent  a  hierarchy.  The  robot  is  on  a  host 
computer  and  will  be  called  ’hostess’.  The  hostess  can  launch  topic  specific 
searches  using  crawlers,  which  download  Internet  documents  one-by-one,  using 
the  links  of  the  documents.  Intelligent  crawlers  adapt  according  to  the  rein¬ 
forcing  signal  received  from  the  external  world  and  transferred  by  the  hostess. 
Communication  is  possible  amongst 


FIGURE  9.  Hostess-crawler  system 

Internet  is  explored  by  crawlers  (which  download  and  exam¬ 
ine  documents),  whereas  communication  with  other  entities  is 
governed  by  the  hostess,  which  can  launch  and  modify  crawlers 
based  on  reinforcements. 

•  crawlers  and 

•  hostesses 

Communication  can  be  direct  or  indirect.  For  example,  if  the  hostess  launches 
crawlers  on  the  same  topic  and  provides  positive  reinforcement  only  to  the 
one,  which  brings  the  information  first,  indirect  communication  happens.  This 
indirect  communication  will  manifest  itself  as  a  competition.  The  competition 
will  be  directly  addressed  if  crawler  maintain  and  refresh  a  list  of  good  links  - 
we  call  those  weblogs  -  which  used  to  provide  the  most  positive  reinforcement 
from  the  crawler.  One  may  see  these  weblogs  as  local  sub-crawlers,  which  have 
starting  points  and  search  a  smaller  neighborhood.  The  algorithm  of  weblog 
selection  is  provided  in  Fig.  10 

The  origin  of  competition  can  be  seen  as  follows.  The  small-world  property 
can  trap  crawlers  in  a  neighborhood  where  links  are  abundant  and  point  to 
each  other.  Crawlers  might  not  find  any  novel  information.  To  overcome  this 
limitation,  each  crawler  may  maintain  a  list  of  links,  which  are  worth  to  visit. 
Assume  two  crawlers:  one  with  a  long  list  and  another  one  with  a  single  link. 
The  second  crawler  will  find  the  novel  information  on  that  single  link  with  high 
probability.  However,  that,  information  will  not  be  novel  for  the  first  crawler 
anymore,  therefore,  the  first  crawler  will  not  be  able  to  "publish"  it  at  the 
hostess.  As  the  consequence,  the  first  crawler  will  lower  the  value  of  this  link 
and  will  visit  this  site  less-and-less  frequently. 
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1)  s  <—  startNode 

2)  FOR  i  =  1  To  memory 

a)  s  <—  chooseRL(frontier) 

memory 

b)  wl  <— s,  vv(.s)<—  y 'rt 

k-i 

3)  sort(wl,w(s)),  head(wl,  wlsize) 

4)  REPEAT  (for  every  local  search) 

a)  w'(s)=  w(s)*y ,  View/ 

b)  s  <—  choose(wl) 

c)  FOR  i  =  1  To  memory 

i)  s  <—  chooseRL)  frontier) 

memory 

ii)  wl  <—  s,  u/(v)=  H’(v)+  y ^  rk 

k=i 

iii)  sortfwl,  w(s)),  head(wl,  wlsize) 

5)  UNTIL  end  of  experiment 


Figure  10.  Algorithm  for  forming  a  weblog 

The  algorithm  has  three  important  parameters:  (i)  ‘ m.emory 
size  of  environment  around  a  node  of  the  network,  examined 
during  the  evaluation  of  that  node,  (ii)  ‘wlsize’:  size  of  the 
weblog  (number  of  links  that  can  be  remembered),  and  (iii) 
‘7  discount  factor  of  past  rewards  received  around  a  node. 


The  downloaded  part  of  the  Internet  and  its  division  between  the  two 
crawlers  can  be  seen  in  Fug.  12 

The  competition  gave  rise  to  a  decrease  the  age  of  found  relevant  document 
in  our  model  experiments  as  it  is  shown  in  Figs.  13  and  14 

3.3.  Experiments  on  the  Internet.  The  internet  experiments  used  two 
crawlers.  Crawlers  were  launched  from  the  CNN  Money  site  (http://www.cnnmoney.com). 
The  crawlers  run  for  about  five  days,  starting  on  September  12  and  finishing 
on  September  16,  2002.  The  weekend  was  on  14-15  of  September. 

At  first,  all  documents  were  novel  to  the  crawlers,  the  number  of  rewarded 
downloads  was  large.  This  can  be  seen  in  Fig.  15.  The  two  crawlers  are 
distinguished  by  colors.  At  the  beginning,  the  green  crawler  mostly  follows  the 
red  crawler  and  receives  very  few7  results.  At  around  5000th  download  the  green 
crawder  finds  other  routes  to  follow7.  This  new  routes  turn  out  to  be  rewarding, 
the  efficiency  of  the  green  crawler  becomes  about  the  same  as  that  of  the  red 
crawler  in  this  region.  Both  crawlers  are  inefficient  between  downloads  15,000 
and  30,000,  this  is  the  weekend.  After  the  weekend,  the  novel  routes  found 
by  the  green  crawder  are  more  efficient.  The  number  of  novel  documents  is 
much  larger  at  the  start  of  the  search  when  all  documents  are  novel  for  the 
competing  crawlers.  This  can  be  seen  by  considering  the  slope  of  the  curves  at 
the  beginning  of  the  search. 

I11  about  half  the  cases  w7e  could  establish  the  date  of  the  document.  Results 
are  shown  in  Fig.  16.  After  four  days,  some  of  the  documents  found  by  the 
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FIGURE  11.  Two  competing  crawlers  using  weblogs  in 
a  "small- world" 

The  figure  depicts  the  connectivity  table  (representing  the 
links)  between  nodes.  Typical  scale-free  small  world  connec¬ 
tivity  can  be  seen  here:  some  nodes  have  many  links  pointing 
onto  them  (rows),  whereas  others  have  a  long  list  of  links  point¬ 
ing  to  other  nodes  (columns).  Green  (red)  dots  depict  the  links 
used  by  the  first  (second)  crawler.  Efficient  self-organizing  di¬ 
vision  of  the  world,  i.e.,  the  task  space,  can  be  seen.  Weblogs 
contained  10  links. 

crawlers  are  still  ‘old’,  but  the  relative  rate  of  novel  documents  is  increasing 
strongly  after  the  start  of  the  new  week  (at  around  reward  number  2,700). 
The  number  of  novel  documents  is  much  larger  at  the  start  of  the  search  when 
all  documents  are  novel  for  the  competing  crawlers.  Note  that  365  units  on 
the  vertical  scale  correspond  to  a  year.  The  crawlers  collect  the  daily  docu¬ 
ments, too,  as  it  can  is  shown  by  the  slight  slope  of  maxima  (the  convex  hull  of 
the  maxima)  of  the  points. 

More  detailed  insight  can  be  gained  by  means  of  Fig.  17.  This  figure  contains 
only  the  mst  recent  downloads,  which  are  restricted  to  a  few  days  and  contain 
a  large  number  of  very  recent  documents.  At  this  time  the  green  crawler 
is  more  efficient:  the  novel  routes  found  by  the  green  crawler  (forced  by  the 
competition)  happened  to  be  more  rewarding  during  this  last  day  of  the  studies. 

The  sharing  of  the  links  between  the  two  crawlers  is  depicted  in  the  following 
three  figures.  Figure  18  shows  the  full  time  region.  Blue  dots  represent  the 
links  followed  by  both  crawlers.  It  can  be  seen  that  during  the  full  time  domain, 
the  explored  regions  are  mostly  covered  by  both  crawlers. 

This  situation  changed  during  competition,  as  it  is  demonstrated  by  Figs.  19 
and  20.  These  figures  show  the  links  followed  in  the  last  25%  of  the  downloads 
and  the  last  10%  of  the  downloads,  respectively. 

The  last  figure  (Fig.  21)s  serve  to  underline  our  statement  that  the  improve¬ 
ment  in  crawler  performance  is  due  to  the  continuous  improvement  of  weblogs. 
We  show  the  best  10  weblogs  used  by  both  crawlers  as  a  function  of  changes 


EOARD-NIPG-ELU- 1 8-SEPT-2002 


23 


Relevance  of  these 


which  increase 
the  value  of  crawler  A 

FIGURE  12.  Division  of  a  substructure  of  the  Internet 
between  two-crawlers 

The  gray  region  belongs  to  crawler  A.  Both  crawlers  maintain 
a  list  of  weblogs,  which  are  efficient  restart  links  to  continue 
search  when  novel  information  in  the  actual  neighborhood  has 
been  collected.  Weblogs  are  ordered  by  their  values  and  change 
dynamically  amongst  crawlers  during  competition. 


made  in  the  weblogs.  Recall  that,  changes  are  induced  by  reinforcement.  At 
the  very  beginning  the  overlap  is  clear,  the  crawlers  use  the  same  site.  Newer 
and  newer  nodes  are  promoted  to  the  top  ten  part  of  the  weblog  list.  Overlap 
occurs  when  one  of  the  crawlers  become  less  efficient  (see  the  blue  region  in  the 
middle  of  the  figure).  In  most  of  the  time,  the  surfing  region  of  the  crawlers  is 
launched  from  distinct  html  pages. 

The  concept  of  selection  of  weblogs  is  the  key  to  improve  performance, 
whereas  time  sharing  and  reinforcing  only  the  first  crawler  provides  the  im¬ 
provement  in  efficiency.  This  particular  form  of  reinforcement,  in  turn,  gives 
rise  to  collaboration  amongst  competitive  agents. 
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Cumulated  average  age  of  found  nodes 


time 


FIGURE  13.  Decrease  of  age  of  found  novel  documents 

New  documents  appear,  whereas  old  documents  disappear  in 
this  experiment.  Competition  settles  around  "time"  600.  The 
overlap  amongst  links  searched  by  crawlers  becomes  minimized 
by  this  time.  Both  crawlers  have  high  connectivity  weblogs. 
The  age  of  found  novel  documents  is  decreasing  afterwards. 


Relevant  nodes  found 


time 


FIGURE  14.  Increase  of  access  to  novel  documents. 

New  documents  appear,  whereas  old  documents  disappear  in 
this  experiment.  Competition  settles  around  "time"  600.  The 
overlap  amongst  links  searched  by  crawlers  becomes  minimized 
by  this  time.  Both  crawlers  have  high  connectivity  weblogs. 
The  efficiency  of  found  novel  documents  is  increasing  after¬ 
wards. 
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Novel  documents  on  CNNMoney 


FIGURE  15.  Novel  documents  found  by  two  competing 
crawlers 

Two  competing  crawlers  are  searching  the  CNN  Moneysite 
for  novel  information  (information  not  yet  known  to  the 
crawlers).  During  this  search  only  the  crawler  which  delivers 
the  information  first,  is  reinforced. 
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Figure  16.  Age  of  documents  found  by  competing 
crawlers 

Age  of  the  documents  are  shown  as  a  function  of  reward  order. 

Red  (green)  dots:  document  downloaded  by  the  red  (green) 
crawler.  Blue  dots:  documents  downloaded  by  both  crawlers. 

Note  that  365  units  on  the  vertical  scale  correspond  to  a  year. 

4.  Short  discussion  and  future  directions 

The  competitive-collaborative  method  has  the  following  advantages.  Pro¬ 
cessing  is  local:  each  crawler  modifies  values  of  known  links  based  on  the  rein¬ 
forcement.  There  is  a  global  improvement  in  performance. 
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Figure  17.  Age  of  documents  found  by  competing 
crawlers  after  four  days 

Age  of  the  documents  are  shown  as  a  function  of  reward  order. 
Red  (green)  dots:  document  downloaded  by  the  red  (green) 
crawler.  Blue  dots:  documents  downloaded  by  both  crawlers. 
Note  that  365  units  on  the  vertical  scale  correspond  to  a  year. 


"From"  node 


Figure  18.  Links  used  by  crawlers 

Red,  green  and  blue  dots:  links  followed  by  red,  green  and 
both  crawlers  respectively  All  followed  links  are  depicted. 


Let  us  consider  now  interacting  hostesses.  Some  of  the  hostesses  (authors) 
can  be  selected  to  being  a  ‘reviewer’,  a  member  of  the  board,  or  the  editor. 
The  selection  could  be  based  on  their  efficiency,  or  that  the  search  is  divided 
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All  followed  and  rewarde  links  (last  25%) 


FIGURE  19.  Links  used  by  crawlers  in  the  last  quarter 
of  the  experiments 

Red,  green  and  blue  dots:  links  followed  by  red,  green  and  both 
crawlers  respectively  All  followed  links  are  depicted.  Sharing 
of  the  links  has  been  improved. 


"To"  node 

FIGURE  20.  Links  used  by  crawlers  in  the  last  10%  of 
the  experiments 

Red,  green  and  blue  dots:  links  followed  by  red,  green  and  both 
crawlers  respectively  All  followed  links  are  depicted.  Further 
improvement  in  link  sharing  can  be  seen. 


into  topics  and  central  hostesses  have  access  to  the  context  of  the  search.  An¬ 
other  intriguing  possibility  is  that  in  a  general  search  problem,  search  topics 
are  may  also  be  subject  to  competition.  Under  this  condition,  small-worlds 
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"Top  ten"  weblogs  vs.  "time” 


"Time"  (ordered  occurences  of  changes) 


FIGURE  21.  Top  ten  links  used  by  the  crawlers 

Red,  green  and  blue  dots:  links  followed  by  red,  green  and  both 
crawlers  respectively  All  followed  links  are  depicted.  Further 
improvement  in  link  sharing  can  be  seen. 

become  larger  (!),  the  ‘diameter’  belonging  to  topics  is  larger,  however,  search¬ 
ing  may  become  more  efficient.  Central  hostesses  may  emerge  as  a  result  of 
this  competition.  The  emergence  of  central  hostesses  can  be  promoted  by  the 
type  of  interactions,  the  access  control,  allowed  by  the  robotic  community.  We 
conclude  this  paragraph  by  noting  that  the  hostess-crawler  system  is  one  type 
of  SOS,  whereas  collaborating  hostesses  can  form  another  type  of  SOS. 

4.1.  Future  directions.  Interaction  amongst  the  components  of  the  system 
fall  into  two  categories: 

(1)  Interactions  within  a  single  SOS 

(2)  Interactions  among  different  SOSs 

Each  SOS  is  motivated  to  become  stronger  -  as  a  result  of  reinforcemnt.  The 
overall  gain  of  a  SOS,  in  this  competitive  environment,  depends  on  the  success 
of  other  SOSs.  Each  member  of  a  single  SOS  supports  each  other  by  sharing 
relevant  information.  On  the  other  hand,  no  such  information  sharing  is  done 
between  different  SOSs.  Temporal  sequences  and  synchronizations  support  this 
information  sharing  process.  If  Crawler  A  discovers  event  p  on  site  X  then  it 
might  be  useful  to  send  that  information  to  Crawler  B  "new  information  on 
site  X".  That  information  can  serve  as  the  "context"  for  crawler  B  and  can 
help  to  accelerate  search. 

That  is,  one  should  be  looking  for  synchronous  or  spatio-temporal  patterns. 
The  ideal  tool  -  as  it  seems  to  us  -  is  the  two  types  of  Hebbnet  learning  rules. 
In  turn,  one  may  expect  that  Hebbnets  with  symmetric  learning  rules  will  rep¬ 
resent  synchronous  patterns,  whereas  Hebbnets  with  asymmetric  learning  rules 
will  represent  spatio-temporal  patterns.  The  expected  prototype  form  of  such 
spatio-temporal  patterns  is  an  avalanche-like  pattern,  alike  in  most  sustained 
systems. 
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The  point,  to  remember  is  that  communication  amongst  crawler  or  hostesses 
can  be 

•  indirect  based  upon  reinforcement 

•  direct 

-  by  sending  documents  to  each  other, 

-  by  sending  elements  of  weblogs, 

-  by  sending  information  about  the  fact  that  a  novel  document  has 
been  found  ‘in  my  area’, 

•  or  combinations  of  any  of  these. 

The  communication  amongst  crawlers  or  hostesses  is  defined  by  the  access 
control  rules.  These  rules  are  of  central  importance.  These  rules  determine 
the  type  of  SOS  that  will  emerge.  To  each  type  of  communication,  the  access 
control  model  of  the  corresponding  SOS  must  ensure  that  a  component  will 
not  be  able  to  disclose  or  modify  resources  if  it  does  not  have  the  proper 
authorization.  These  access  rights  determine  the  type  of  co-operation  and  the 
distribution  of  topics  amongst  the  small  sub-systems. 

5.  Conclusions 

The  competitive-collaborative  model  is  efficient  according  to  our  prelimi¬ 
nary  studies.  It  allows  one  to  impose  competition  simply  via  reinforcement. 
Reinforcement  gives  rise  to  the  division  of  tasks  and  improved  efficiency  within 
subtasks. 

The  concept  can  be  extended  by  communication  methods  other  then  the 
indirect  communication  of  the  reinforcing  signal.  Other  types  of  message  send¬ 
ing  methods  open  questions  about  interactions  amongst  self-organizing  systems 
(SOS),  the  access  control  amongst  those  regarding  the  type  of  information  to 
be  communicated. 

The  idea  of  establishing  experience  based  connections  using  Hebbnets  seem 
most  promising  as  the  tool  of  collaboration,  provided  that  access  control  issues 
are  solved.  Learning  rules  of  a  Hebbnet,  can  discover  and,  in  turn,  may  allow 
clustering  of  synchronous  patterns  as  well  as  spatio-temporal  patterns,  most 
probably  in  the  form  of  avalanches. 

Studies  along  these  lines  are  beyond  the  scope  of  the  present  con¬ 
tract.  Considerations  on  different  possibilities  are  currently  in  basic 
research  phase.  In  particular,  studies  about 

•  access  control  rules 

•  learning  and  recognition  of  spatio-temporal  patterns  in  early 
phases 

•  secure  communication  methods 

have  been  started  and  might  reach  a  more  mature  state  in  about 
three  months. 


Appendix 
6.  Crawlers 

6.1.  Introduction.  The  number  of  documents  on  the  world- wide  web  is  way 
over  1  billion  [Diligenti  et,  al.,  2000].  The  number  of  new  documents  is  over  1 
million  per  day.  The  number  of  documents  that  change  on  a  daily  basis,  e.g., 
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documents  about  news,  business,  and  entertainment,  could  be  much  larger. 

This  ever  increasing  growth  presents  a  considerable  problem  for  finding,  gath¬ 
ering,  ordering  the  information  on  the  web.  The  only  search  engine  that  may 
still  warrant  that  the  information  it  provides  is  not  older  than  1  month  is 
AltaVista1.  However,  the  number  of  indexed  pages  on  Altavista  is  about  250 
million  documents.  Google2,  on  the  other  hand,  is  indexing  about  1,300  million 
pages,  but  Google  does  not  warrant  any  refresh  rate  of  these  documents. 

The  problem  is  complex:  These  search  engines  are  not  up-to-date  and  in¬ 
formation  gathering  is  not  always  efficient  with  these  engines.  Search  engines 
may  offer  too  many  documents;  sometimes  on  the  order  of  hundreds  or  many 
thousands.  Many  web  pages  have  no  value,  e.g.,  by  making  use  of  a  large  set 
of  keywords,  or  being  simply  huge  collections  of  documents  originating  from 
broad  sources. 

Specialized,  possibly  personalized  crawlers  are  in  need.  This  problem  repre¬ 
sents  a  real  challenge  for  methods  of  artificial  intelligence  and  has  been  tack¬ 
led  by  several  research  groups  [Cho  et  al.,  1998,  Dean  and  Henzinger,  1999, 
Chakrabarti  et  al.,  1999a,  McCallum  et  al.,  1999,  Kolluri  et  al.,  2000,  Lawrence,  2000, 
McCallum  et  al.,  2000,  Mukherjea,  2000,  Murdock  and  Goel,  1999].  One  of  the 
first  attempts  in  this  direction  was  made  by  Chakrabarti  et  al.  [Chakrabarti  et  al.,  1999b] 
who  put  forth  the  idea  of  focused  crawling.  To  understand  the  idea,  let  us  con¬ 
sider  crawling  in  general.  Assume  that  ‘you  are  at  a  node’  of  the  web.  This 
node  has  been  analyzed  and  you  have  to  decide  what  to  do  next.  It  is  very 
possible  that  relevant  information  can  be  find  in  the  immediate  neighborhood 
of  this  node.  In  turn,  you  download  all  the  documents  next  to  you  and  start 
to  analyze  those  documents.  Doing  so,  you  may  found  relevant  documents  or 
may  not.  When  you  are  done  you  have  the  option  to  download  all  the  doc¬ 
uments  that  are  two  steps  away  from  you  and  to  analyze  those  documents. 

This  approach  is  well  known  in  the  Al  literature  and  is  called  breadth  first 
technique.  However,  the  world- wide  web  is  ‘small':  The  WWW  had  about  800 
million  nodes  in  1999  and  the  number  of  minimal  hops  required  to  reach  most 
documents  from  any  particular  document  was  19  [Albert  et  al.,  1999].  Such 
connectivity  structure  between  units  is  called  ‘small  world’.  In  turn,  breadth 
first  search  incurs  an  enormous  burden  as  a  function  of  depth.  At  one  point 
(at  a  given  depth)  breadth  first,  search  needs  to  be  abandoned  and  a  decision  is 
to  be  made  to  which  node  to  move  next.  To  decide  on  that  move,  the  values  of 
the  nodes  need  to  be  estimated  from  the  point  of  view  of  the  goal  of  the  search. 

Focused  crawling  is  based  on  this  idea.  Focused  crawling  makes  an  attempt 
to  classify  the  content  of  the  document.  If  the  document  falls  into  the  search 
category  then  the  document  is  downloaded  and  the  links  of  the  documents  are 
followed. 

Diligent!  et  al.  [Diligenti  et  al.,  2000]  have  recognized  the  pitfall  of  focused 
crawling:  searched  information  on  the  web  is  typically  hidden:  Sites  of  partic¬ 
ular  interest  may  have  a  lower  number  of  directed  links  then  sites  of  general 
interest.  In  turn,  we  might  face  the  ‘needle  in  the  haystack’  problem  with 
the  haystack  being  sites  on  general  interest.  The  hidden  property  is  thus  the 
implicit  consequence  of  our  particular  interest. 

■'■http :  //www.  altavista.com 

2http: //www.  google,  com 
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Let,  us  consider  sites  dealing  with  support  vector  machines  (SVMs).  Sites 
about  SVMs  are  not  typical  on  the  web.  Not  all  sites  dealing  with  SVM  are 
linked.  In  turn,  focused  crawling  could  be  rather  inefficient  and  this  direct 
search  for  SVM  sites  might  fail.  On  the  other  hand,  most  of  the  SVM  sites 
are  within  (i.e.,  linked  to)  academic  environments,  or  within  sites  dealing  with 
information  technology.  These  topics  are  much  more  general  and  might  have 
much  more  links  and  a  much  higher  ‘visibility’.  In  turn,  searching  for  the  envi¬ 
ronment  of  SVM  sites,  could  be  much  more  efficient.  A  hand- waving  argument 
can  be  given  as  follows.  Documents  are  linked  to  each  other.  Links  are  made 
by  those  for  whom  the  document  has  value.  These  links  form  the  one-step 
context  of  the  document.  The  one-step  context,  in  turn,  may  be  characteristic 
to  the  document.  The  one-step  environment  of  the  document  (i.e.,  documents 
that  are  one  step  away),  documents  that  are  two  steps  away,  etc.,  form  the 
‘context’  of  the  document.  When  we  search  for  a  document,  by  definition,  we 
shall  encounter  the  environment  of  the  document  first.  In  turn,  first,  we  might 
search  for  the  environment,  of  the  document.  This  is  the  idea  behind  ‘context, 
focused  crawling’  (C'FC)  [Diligent, i  et  al.,  2000].  This  idea,  which  is  trivial  for 
graphs  with  high  clustering  probability  (e.g.,  regular  lattices),  could  be  criti¬ 
cized  for  the  case  of  ‘small  worlds’,  when  documents  -  on  average  -  are  about 
as  far  as  the  environment  of  the  document.  However,  the  question  is  intriguing, 
because  the  visibility  could  be  much  less  for  searched  documents  than  that,  of 
the  environment,  of  the  searched  documents. 

C'FC  does  not,  take  into  consideration  the  varieties  on  the  web:  Environ¬ 
ments  may  differ.  For  example,  for  small  universities  or  for  small  research 
institutes  ‘one-step  context’  may  correspond  to  ‘two-step  context’  for  large  de¬ 
partments  of  large  universities.  If  the  order  of  contexts  might,  change  then 
CFG  will  go  close  and  will  miss  the  documents.  In  turn,  the  decision  whether 
to  ‘stay  and  download’  at  a  given  site  or  ‘not  to  download  but,  move’  can  be 
seriously  jeopardized.  Fast,  adapting  value  estimation  method  may  provide  an 
attractive  solution  to  this  search  problem  where  information  is  hidden  within 
not-yet-experienced  environments.  The  environment  of  high  value  documents 
can  provide  reinforcing  feedback  in  a  straightforward  fashion.  Interestingly,  re¬ 
inforcement,  learning  (RL)  has  not  been  found  particularly  efficient,  for  searching 
the  world- wide  web  [Rennie  et,  al.,  1999].  The  efficiency  of  RL,  however,  de¬ 
pends  strongly  on  feature  extraction.  It  seems  natural  to  explore  the  CFG  idea 
as  the  initial  feature  extraction  method  for  RL.  Here  we  show  combine  CFG 
with  RL  to  search  on  the  web. 

6.2.  Methods. 

6.2.1.  Preprocessing  of  texts.  There  is  a  large  variety  of  methods  that  try  to 
classify  texts  [McC'allum,  1996,  Blum  and  Mitchell,  1998,  Dumais  et,  al.,  1998, 

Kaski,  1998,  Chakrabarti  et,  al.,  1998,  Kolenda  et,  al.,  2000,  Mitchell,  1999,  McCallum,  1999, 
Hofmann,  2000,  Nigam  et,  al.,  2000,  Kaban  and  Girolami,  2000,  Vinokourov  and  Girolami,  2000a, 
Joachims,  2000,  Dominich,  2000].  Most  of  these  methods  are  based  on  special 
dimension  reduction.  First,  the  occurrence,  or  sometimes  the  frequency  of  se¬ 
lected  words  is  measured.  The  subset,  of  all  possible  words  (‘bag  of  words’ 

(BoW))  is  selected  by  means  of  probabilistic  measures.  Different,  methods  are 
used  for  the  selection  of  the  ‘most  important,’  subset.  The  occurrences  (0’s 
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and  l’s)  or  the  frequencies  of  the  selected  words  of  the  subsets  are  used  to 
characterize  all  documents.  This  low  -  typically  100  -  dimensional  vector  is 
supposed  to  encompass  important  information  about  the  type  of  the  docu¬ 
ment.  Different  methods  are  used  to  derive  ‘closeness  measures’  between  docu¬ 
ments  in  the  low  dimensional  spaces  of  occurrence  vectors  or  frequency  vectors. 

The  method  can  be  used  both  for  classification,  i.e.,  the  computation  of  deci¬ 
sion  surfaces  between  documents  of  different  ‘labels’  [Blum  and  Mitchell,  1998, 

Dumais  et  al.,  1998,  Joachims,  2000,  Nigam  et  ah,  2000]  and  clustering,  a  more 
careful  way  of  deriving  closeness  (or  similarity)  measures  when  no  labels  are  pro¬ 
vided  [McCallum,  1996,  Kaski,  1998,  Hofmann,  2000,  Kaban  and  Girolami,  2000, 

Vinokourov  and  Girolami,  2000b]. 

We  tried  several  BoW  based  classifiers  on  the  ‘Call  for  Papers’  (CfP)  prob¬ 
lem3.  CfP  is  considered  a  benchmark  classification  problem  of  documents: 

The  ratio  of  correctly  classified  and  misclassified  documents  can  be  automated 
easily  by  checking  whether  the  document  has  the  three  word  phrase  ‘Call  for 
Papers’,  or  not.  Classifiers  were  developed  for  one-step,  two-step  environments, 
etc.,  for  CfP  documents.  We  found  that  these  classifiers  perform  poorly  for  the 
CfP  problem.  In  agreement  with  published  results  [Dumais  et  ah,  1998],  su¬ 
pervised  SVM  classification  was  superior  to  other  methods.  SVM  was  simple 
and  somewhat  better  than  Bayes  classification.  However,  SVM  requires  a  large 
number  of  support  vectors  for  the  CfP  problem. 

6.2.2.  SVM  classification.  The  SVM  classifier  operates  similarly  to  percep- 
trons.  SVM,  however,  has  better  generalizing  capabilities,  see,  e.g.,  the  compre¬ 
hensive  book  of  Vapnik  [Vapnik,  1995]  a  tutorial  material  [Smola  and  Scholkopf,  1998], 
comparisons  with  other  methods  [Guyon  et  ah,  1992,  Cristianini  and  Shawe- Taylor,  1999], 
improved  techniques  [Keerthi  et  ah,  1999]  and  references  therein4.  The  trained 

SVM  was  used  in  ‘soft.  mode’.  That  is,  the  output  of  the  SVM  was  not  a  deci¬ 
sion  (yes,  or  no),  but  instead,  the  output  could  take  continuous  values  between 
0  and  1.  A  saturating  sigmoid  function5  was  used  for  this  purpose.  In  turn, 

(i)  the  non-linearity  of  the  decision  surface  was  not  sharp,  (ii)  for  inputs  close 
to  the  decision  surface  the  classifier  provides  a  linear  output.  The  output  of 
the  sigmoid  non-linearity  can  be  viewed  as  the  probability  of  a  class.  These 
probabilities  for  the  different,  classes  are  distinct  yardsticks  working  on  possibly 
different  features.  The  RL  algorithm  was  used  to  estimate  the  value  of  these 
yardsticks. 

6.2.3.  Value  estimation.  There  is  a  history  of  value  estimation  methods  based 
on  reinforcement  learning:  Some  of  the  important  steps  —  judged  subjec¬ 
tively  —  are  in  the  cited  papers:  [Korf,  1985,  Minton,  1988,  Sutton,  1988, 

Watkins,  1989,  Schmidhuber,  1991,  Mahadevan  and  Connell,  1992,  Dayan  and  Hinton,  1993, 
Kaelbling,  1993,  Rummery  and  Niranjan,  1996,  Littman  et.  al.,  1995,  Mat.aric,  1997, 
Diet.t.erich,  2000].  A  thorough  review  on  the  literature  and  the  history  of  RL 
can  be  found  in  [Sutton  and  Barto,  1998].  In  our  approach,  value  estimation 

3The  CfP  problem  is  defined  by  deleting  the  phrase  ‘call  for  paper’  from  the  document, 
executing  search  on  the  internet  and  considering  each  document  that  contains  the  phrase 
‘call  for  paper’  a  ‘hit’. 

4Note  that  SVM  has  no  adjustable  parameters. 

5output  =  1+exp(_\tinput) 
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plays  a  central  role.  Value  estimation  works  on  states  (s)  and  provides  a  real 
number,  the  value ,  that  belongs  to  that,  state:  V(s)  €  R.  Value  estimation  is 
based  on  the  immediate  rewards  (e.g.,  the  number  of  hits)  that  could  be  gained 
at  the  given  state  by  executing  different  actions  (e.g.,  download  or  move).  Value 
of  a  state  (a  node,  for  example)  is  the  long-term  cumulative  reward  that  can 
be  collected  starting  from  that  state  and  using  a  policy.  Policy  is  a  probability 
distribution  over  different  actions  for  each  state:  policy  determines  the  prob¬ 
ability  of  choosing  and  action  in  a  given  state.  Policy  improvement  and  the 
finding  of  an  optimal  policy  are  central  issues  in  RL.  RL  procedures  can  be 
simplified  if  all  possible  future  states  are  available  and  can  be  evaluated.  This 
is  our  case.  In  this  case  one  does  not  have  to  represent  the  policy.  Instead,  one 
could  evaluate  all  neighboring  nodes  of  the  actual  state  and  move  to  (and/or 
download)  the  one  with  the  largest  estimated  long  term  cumulated  reward,  the 
estimated  value.  Typically  one  includes  random  choices  for  a  few  percentages 
of  the  steps.  These  random  choices  are  called  ‘explorations’.  The  estimated 
value  based  greedy  choice  is  called  ‘exploitation’. 

If  the  downloaded  document  contains  the  phrase  ‘call  for  papers’  then  the 
learning  system  incurs  an  immediate  rewrard  of  1.  If  a  downloaded  document 
does  not  contain  this  phrase  then  there  is  negative  reward  (i.e.,  a  punishment)  of 
-0.01.  These  numbers  were  rather  arbitrary.  The  relative  ratio  between  reward 
and  punishment  and  the  magnitude  of  the  parameter  of  the  sigmoid  function 
do  matter.  These  parameters  influence  learning  capabilities.  Our  studies  were 
constrained  to  a  fixed  set  of  parameters.  One  may  expect  improvements  upon 
optimizing  these  parameters  for  a  particular  problem.  In  our  case,  search  over 
the  internet  was  time  consuming  and  prohibited  this  optimization. 

Value  estimation  makes  use  of  the  following  upgrade 

(5)  V+(st)  =  V(st)  +  a  *  (rt+i  +  7  *  V(st+i)  -  V(st)) 

where  a  is  the  learning  rate,  rt+1  G  R  is  the  immediate  reward,  0  <  7  <  1 
is  the  discount  factor,  and  subscripts  t  =  1,2, .. .  indicate  action  number  (i.e., 
time).  This  particular  upgrade  is  called  temporal  differencing  with  zero  el¬ 
igibility,  i.e.,  the  TD(0)  upgrade.  TD  methods  were  introduced  by  Sutton 
[Sutton,  1988].  An  excellent  introduction  to  value  estimation,  including  the 
history  of  TD  methods  and  description  on  the  applications  of  parameterized 
function  approximators  can  be  found  in  [Sutton  and  Barto,  1998].  Concerning 
details  of  the  RL  technique,  (a)  we  used  eligibility  traces,  (b)  Opposed  to  the 
description  given  above,  we  did  not  need  explorative  steps  because  the  envi¬ 
ronments  can  be  very  different  and  that,  diminished  the  need  for  exploration, 
(c)  We  did  not,  decrease  the  value  of  a  by  time  to  keep  adaptivity,  (d)  We 
approximated  the  value  function  as  follows 

n 

(6)  V{s)  «  £  Wl  <r(SVM(i)) 

i=  1 

where  the  output,  of  the  ith  SVM  (i.e.,  the  ith  component  of  the  output,)  is 
denoted  by  SVM(i),  er(.)  denotes  the  sigmoid  function  acting  on  the  outputs  of 
the  SVM  classifiers,  Wi  is  the  weight  (or  relevance)  of  the  ith  classifier  deter¬ 
mined  by  upgrade  Eq.  5.  If  the  quality  of  the  upgrade  is  measured  by  the  mean 
square  error  of  the  estimations  then  the  following  approximate  weight,  upgrade 
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Figure  22.  Context  of  the  document 
Document  and  its  first  and  second  ‘neighbors’. 


can  be  derived  for  the  weights  (see,  e.g.,  [Sutton  and  Barto,  1998]  for  details): 

(7)  A Wi  =  a*  (rt+ i  +  7  *  V(st+i)  -  V(st))  *  <r(SVM(i)). 

This  upgrade  —  extended  with  eligibility  traces  [Sutton,  1988,  Sutton  and  Barto,  1998] 

—  was  used  in  our  RL  engine. 

6.3.  Features  and  learning  to  search. 

6.4.  Breadth  first  crawler.  A  crawler  is  called  breadth  first  crawler ,  if  it  first 
downloads  the  document  of  the  launching  site,  continues  by  downloading  the 
documents  of  all  first  neighbors  of  the  launching  site,  then  the  documents  of  the 
neighboring  sites  of  the  first  neighbor  sites,  i.e.,  the  documents  of  the  second 
neighbor  sites,  and  so  on. 

6.4.1.  Context  focused  crawler.  A  target  document  and  its  environment  are 
illustrated  in  Fig.  (22).  .  The  goal  is  to  locate  the  document  by  recogniz¬ 
ing  its  environment  first  and  then  the  document  within.  The  CFG  method 
[Diligenti  et  al.,  2000]  was  modified  slightly  —  in  order  to  allow  direct  compar¬ 
isons  between  the  CFG  method  and  the  CFG  method  extended  by  RL  value 
estimation  —  and  the  following  procedure  was  applied.  First,  a  set  of  ir¬ 
relevant  documents  were  collected.  The  kth  classifier  was  trained  on  (good) 
documents  fc-steps  away  from  known  target  documents  and  on  (bad)  irrelevant 
documents.  The  classifier  was  trained  to  output  a  positive  number  (‘yes’)  for 
good  documents  and  to  output  a  negative  number  (‘no’)  for  irrelevant  docu¬ 
ments.  The  outputs  were  scaled  into  the  interval  (0,1)  by  using  the  sigmoid 
function  o(x)  =  (1  +  exp{— Ax))-1.  If  the  kth  classifiers  output  was  close  to  1 

-  according  to  its  decision  surface  -  there  is  a  target  document  k- steps  away 
from  the  actual  site/document.  If  more  than  one  classifier  outputs  ‘yes’  then 
only  the  best  classifier  is  considered  in  CFG.  Other  outputs  are  neglected.  The 
CFG  idea  with  SVM  classifiers  is  shown  in  Fig.  (23) (a).  CFG  maintains  a  list 
of  visited  links  ordered  according  to  the  SVM  classification.  One  of  the  links 
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Figure  23.  SVM  based  document  classifiers  (A)  Classi¬ 
fication  of  distance  from  document  using  SVM  classifiers.  The 
CFC  method  maintains  a  list  of  visited  links  ordered  according 
to  the  SVM  classification.  One  of  the  links  belonging  to  the 
best  non-empty  classifier  is  visited  next.  (B)  Value  estimation 
based  on  SVM  classifiers.  Reinforcement  learning  is  used  to 
estimate  the  importance  of  the  different  classifiers  during 
search. 


belonging  to  the  best  non-empty  classifier  is  visited  next  (this  procedure  is 
called  backtracking). 

The  problem  of  the  CFC  method  can  be  seen  by  considering  that  neighbor¬ 
hoods  on  the  WWW  may  differ  considerably.  Even  if  the  kth  classifier  is  the 
best  possible  such  classifier  for  the  whole  web,  it  might  provide  poor  results  in 
some  (possibly  many)  neighborhoods.  For  example,  if  there  is  a  large  number  of 
connected  documents  all  having  the  promise  that  there  is  a  valuable  document 
in  their  neighborhood  -  but  there  is,  in  fact,  none  -  then  the  CFC  crawler  will 
download  all  invaluable  documents  before  moving  further.  It  is  more  efficient 
to  learn  which  classifiers  predict  well  and  to  move  away  from  regions  which 
have  great  but  unfulfilled  promises. 

It  has  been  suggested  that  classifiers  could  be  retrained  to  keep  adaptivity 
[Diligenti  et  ah,  2000].  The  retraining  procedure,  however,  takes  too  long6  and 
can  be  ambiguous  if  CFC  is  combined  with  backtracking.  Moreover,  retrain¬ 
ing  may  require  continuous  supervisory  monitoring  and  supervisory  decisions. 
Instead  of  retraining,  we  suggest  to  determine  the  relevance  of  the  classifiers 
during  the  search. 

6.4.2.  CFC  and  RL:  Fast  adaptation  during  search.  Reinforcement  learning 
offers  a  solution  here.  If  the  prewired  order  of  the  classifiers  is  questionable  then 
we  could  learn  the  correct  ordering.  There  is  nothing  to  loose  here,  provided 
that  learning  is  fast.  If  prewiring  is  perfect  then  the  fast  learning  procedure 
will  not  modify  it.  If  the  prewiring  is  imperfect  then  proper  weights  will  be 
derived  by  the  learning  algorithm. 


®Training  may  take  on  the  order  of  a  day  or  so  on  700  MHz  Pentium  III  according  to  our 
experiences. 
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Figure  24.  Search  pattern  for  breadth  first  crawler. 

Search  was  launched  from  neutral  site.  A  site  is  called  neutral 
if  there  is  very  few  target  document  in  its  environment.  Di¬ 
ameter  of  open  circles  is  proportional  to  the  number  of  target 
documents  downloaded.  Edges  are  color  coded.  There  are  two 
extremes.  Dark  blue:  Site  was  visited  at  the  early  stage  dur¬ 
ing  the  search.  Light  blue:  Recently  visited  site.  (For  further 
details,  see  text.) 


The  outputs  of  the  SVMs  can  be  saved.  These  outputs  can  be  used  to  esti¬ 
mate  the  value  of  a  document  at  any  instant.  Value  is  estimated  by  estimating 
weights  for  each  SVM  and  adding  up  the  SVM  outputs  multiplied  by  these 
weights.  In  turn,  one  can  compute  value  based  ordering  of  the  documents  with 
minor  computational  effort  and  this  reordering  can  be  made  at  each  step.  This 
reordering  of  the  documents  replaces  prewired  ordering  of  the  CFG  method. 
The  new  architecture  is  shown  in  Fig.  (23)  (b). 

6.5.  Results  and  discussion.  The  CfP  problem  has  been  studied.  Search 
pattern  at  the  initial  phase  for  the  breadth  first  method  is  shown  in  Fig.  (24). 

Search  patterns  for  the  context  focused  crawler  and  the  crawler  using  RL 
based  value  estimation  are  shown  in  Fig.  (25)  and  Fig.  (26).  The  launching 
site  of  these  searches  was  a  ‘neutral  site’,  a  relatively  large  site  containing 
few  CfP  documents  (http://www.inf.elte.hu).  We  consider  this  type  of 
launching  important  for  web  crawling:  It  simulates  the  case  when  mail  lists 
are  not  available,  traditional  search  engines  are  not  satisfactory,  and  breadth 
first  search  is  inefficient.  This  particular  site  was  chosen  because  breadth  first 
search  could  find  very  few  documents  starting  from  this  site. 

‘Scales’  on  Fig.  (25)  and  Fig.  (26)  differ  from  each  other  and  from  that 
of  Fig.  (24).  ‘True  surfed  scale’  would  be  reflected  by  normalizing  to  edge 
thickness.  Radius  of  open  circles  is  proportional  to  the  number  of  downloaded 
target  documents.  The  CFG  is  only  somewhat  better  in  the  initial  phase  than 
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FIGURE  25.  Search  pattern  for  context  focused  crawler. 

Search  was  launched  from  neutral  site.  Diameter  of  open  cir¬ 
cles  is  proportional  to  the  number  of  target  documents  down¬ 
loaded.  Edges  are  color  coded.  There  are  two  extremes.  Dark 
blue:  Site  was  visited  at  the  early  stage  during  the  search. 
Light  blue:  Recently  visited  site. 


Figure  26.  Search  pattern  for  CFC  and  reinforcement 
learning 

Search  was  launched  from  neutral  site.  Diameter  of  open  cir¬ 
cles  is  proportional  to  the  number  of  target  documents  down¬ 
loaded.  Edges  are  color  coded.  There  are  two  extremes.  Dark 
blue:  Site  was  visited  at  the  early  stage  during  the  search. 
Light  blue:  Recently  visited  site. 
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Figure  27.  Results  of  breadth  first,  CFC  and  CFC 
based  RL  methods. 

the  breadth  first,  method.  Longer  search  shows  that  CFC  becomes  considerably 
better  than  the  breadth  first  method  when  search  is  launched  from  this  neutral 
site. 

Quantitative  comparisons  are  shown  in  Fig.  (27).  According  to  the  fig¬ 
ure,  upon  downloading  20,000  documents,  the  number  of  hits  were  about  50, 
200,  and  1000  for  the  breadth  first,  the  CFC  and  CFC  based  RL  crawlers,  re¬ 
spectively.  These  launches  were  conducted  at  about  the  same  time.  We  shall 
demonstrate  that  the  large  difference  between  CFC  and  CFC  based  RL  method 
is  mostly  due  to  the  adaptive  properties  of  the  RL  crawler. 

There  are  two  site  types  that  have  been  investigated.  The  first  site  type  is 
the  neutral  site  that  has  been  described  before.  The  other  site  was  a  mail  server 
on  conferences.  Also,  for  some  examples  there  are  runs  separated  by  one  month 
(March,  2001).  A  large  number  of  summer  conferences  made  announcements 
during  this  month. 

First,  let  us  examine  the  initial  phase  of  the  search.  This  initial  phase  of  the 
search  (the  first  200  downloaded  documents)  is  shown  in  Fig.  (28).  According 
to  this  figure  downloading  is  very  efficient  from  the  mail  server  site  in  each  oc¬ 
casion.  The  (non-adapting)  CFC  crawler  utilizing  averaged  weights  is  superior 
to  all  the  other  crawlers  —  almost  all  downloaded  documents  are  hits.  Close 
to  this  site  there  are  many  relevant  documents  and  the  ‘breadth  first  crawler’ 
is  also  efficient  here.  Nevertheless,  the  CFC  crawler  outperforms  the  breadth 
first  crawler  in  this  domain.  Launching  from  neutral  sites  is  inefficient  at  this 
early  phase.  Breadth  first  method  finds  no  hit  close  to  the  neutral  site  (not 
shown  in  the  figure). 

Middle  phase  of  the  search  is  shown  in  Fig.  (29).  Performance  in  the  middle 
phase  is  somewhat  different.  Sometimes,  launches  from  the  neutral  site  can 
find  excellent  regions.  The  CFC  crawler  is  still  competitive  if  launched  from 
the  mail  server.  Launches  from  the  mail  list  spanning  one  month  looked  similar 
to  each  other;  conference  announcements  barely  modified  the  results. 
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Need  for  Adaptation 


FIGURE  28.  Comparisons  between  ‘neutral’  and 
mail  server  sites  in  the  initial  phase.  Reward  and 
punishment  are  given  in  the  legend  of  the  figure.  Dif¬ 
ferences  between  similar  types  are  due  to  differences 
in  launching  time.  The  largest  time  difference  be¬ 
tween  similar  types  is  one  month.  Neutral  site  (thin 
lines):  http://www.inf.elte.hu.  Mail  list  (thick  lines): 
http : //www . newcast le . research . ec . org/ cabernet/events/msg00043 . html. 
Search  with  ‘no  adaptation’  (dotted  line)  was  launched  from 
mail  list  and  used  average  weights  from  another  search  that 
was  launched  from  the  same  place. 


Need  for  Adaptation 


FIGURE  29.  Comparisons  between  ‘neutral’  and  mail 
server  sites  up  to  2000  documents.  Same  conditions  as 
in  Fig.  (28) 
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FIGURE  30.  Comparisons  between  different  sites  up  to 
20,000  documents.  Same  conditions  as  in  Figs.  (28)  and 
(29).  Search  with  ‘no  adaptation’  used  average  weights  from 
another  search  that  was  launched  from  the  same  place  (de¬ 
noted  by  *) 

Search  results  up  to  20,000  documents  are  shown  in  Fig.  (30).  This  graph 
contains  results  from  a  subset  of  the  runs  that  we  have  executed.  These  runs 
were  launched  from  different  sites;  the  neutral  site  and  the  mail  list,  as  well  as  a 
third  type,  the  ‘conference’  site:  http://  www.informatik.uni-freiburg.de/ 
index.en.html.  This  latter  is  known  to  be  involved  in  organizing  conferences. 
Adapting  RL  crawlers  collected  a  large  number  of  documents  from  all  site  types 
and  during  the  whole  (one  month)  time  region.  The  rate  of  collection  was  be¬ 
tween  2%-5%.  In  contrast,  although  the  collection  rate  is  close  to  100%  for 
the  CFG  launched  from  the  mail  list  site  up  to  200  downloads,  the  lack  of 
adaptation  prohibits  this  crawler  to  find  new  target  documents  in  circa  17,000 
downloads  at  later  stages.  Taken  together: 

(1)  Identical  launching  conditions  may  give  rise  to  very  different  results 
one  month  later. 

(2)  Starting  from  a  neutral  site  can  be  as  effective  as  starting  from  a  mailing 
list  for  the  adaptive  RL  crawler. 

(3)  The  lack  of  adaptation  is  a  serious  drawback  even  if  the  crawler  is 
launched  from  a  mailing  list. 

The  importance  of  adaptation  is  also  demonstrated  by  the  RL  weights  as¬ 
signed  during  search.  These  weights  are  shown  in  the  following  figures.  Figure 
(31)  depicts  the  weights  belonging  to  the  different  SVMs  launched  from  the 
mail  list  site.  At  the  beginning  of  the  search  the  weights  are  almost  perfectly 
ordered;  the  largest  weight  is  given  to  the  SVM  that  predicts  relevant  doc¬ 
ument  ‘one  step  away’  whereas  the  4th  and  the  5th  SVMs  have  the  smallest 
weights.  That  is,  RL  ‘pays  attention’  to  the  first  SVM  and  pays  less  attention 
to  the  others.  This  order  changes  as  time  goes  on.  There  are  regions  (at  around 
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Weight  Adaptation 


Tuning  Steps  during  Crawling 


Figure  31.  Change  of  weights  of  SVMs  upon  down¬ 
loads  from  mail  site. 

Horizontal  axis:  Occasions  when  weights  were  trained. 


tuning  step  number  1700  on  the  horizontal  axis)  where  most  attention  is  paid 
to  the  5th  SVM  and  smaller  attention  is  paid  to  the  others.  This  means  that 
the  crawler  will  move  away  from  the  region.  The  order  of  importance  changes 
again  when  a  rich  region  is  found;  the  importance  of  the  first  SVM  recovers 
quickly  and,  in  turn,  crawling  is  dominated  by  the  weight  of  the  first  SVM: 
The  crawler  ‘stays’  and  downloads  documents. 

‘Weight  history’  is  different  at  the  neutral  site  (Fig.  (32)).  Up  to  about  100 
downloads  very  few  relevant  documents  were  found  at  this  site.  The  value  of 
weight  of  the  5th  SVM  is  slightly  positive,  whereas  the  values  of  the  others  are 
negative.  The  1st  and  the  2nd  SVMs  are  weighted  the  ‘worst’;  weights  belonging 
to  these  classifiers  are  large  negative  numbers.  At  this  site,  the  order  of  SVMs 
that  were  trained  at  around  target  documents  is  not  appropriate.  Situation 
changes  quickly  when  a  rich  region  is  found.  In  such  regions  the  1st  SVM  takes 
the  lead.  It  is  typical  that  the  weight  of  the  5th  SVM  is  ranked  second.  That  is, 
the  adaptation  concerns  mostly  whether  the  crawler  should  stay  or  if  it  should 
move  ‘far  away’.  In  turn,  information  contained  by  the  ‘context’  is  relevant 
and  can  be  used  to  optimize  the  behavior  of  the  crawler. 

6.6.  Conclusions.  We  have  suggested  a  novel  method  for  web  search.  The 
method  makes  use  of  combinations  of  two  popular  AI  techniques,  support  vec¬ 
tor  machines  (SVM)  and  reinforcement  learning  (RL).  The  method  has  a  few 
adapting  parameters  that  can  be  optimized  during  the  search.  This  parame¬ 
terization  helps  the  crawler  to  adapt  to  different  parts  of  the  web.  The  out¬ 
puts  of  the  SVMs,  together,  formed  a  set  of  ‘yardsticks’  for  the  estimation 
of  the  distance  from  target  documents.  The  value  (the  weight)  of  the  differ¬ 
ent  yardsticks  may  be  very  different  at  different  neighborhoods.  The  point  is 
that  (i)  RL  is  efficient  with  good  features  (the  as  k-step  SVMs  in  this  case), 
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Weight  Adaptation 


Tuning  Steps  during  Crawling 


Figure  32.  Change  of  weights  of  SVMs  in  value  esti¬ 
mation  for  ‘neutral’  site. 

Horizontal  axis:  Occasions  when  value  estimation  was  erro¬ 
neous  and  weights  were  trained. 

(ii)  if  there  are  just  a  few  parameters  for  RL  then  these  parameters  can  be 
trained  quickly  by  rewarding  for  target  documents.  RL  has  many  different 
formulations  all  of  which  could  be  applied  here.  Most  promising  are  the  ap¬ 
proaches  that  can  take  into  account  (many)  different  criteria  in  the  search  ob¬ 
jective  [Fraser  and  Hauge,  1998,  Gabor  et  al.,  1998,  Dubois  et  al.,  2000].  Alas, 
RL  methods  are  capable  of  extracting  features  [Thrun  and  Schwartz,  1995]  that 
may  complement  the  prewired  SYM  features. 
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